# crawler-user-agents

**Repository Path**: mirrors_Olical/crawler-user-agents

## Basic Information

- **Project Name**: crawler-user-agents
- **Description**: Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-09-25
- **Last Updated**: 2026-05-24

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

## Install

### Direct download 

Download the [`crawler-user-agents.json` file](https://raw.githubusercontent.com/monperrus/crawler-user-agents/master/crawler-user-agents.json) from this repository directly.

### Npm / Yarn
Install using npm or Yarn, or d

```sh
npm install --save "https://github.com/monperrus/crawler-user-agents.git"
# OR
yarn add "https://github.com/monperrus/crawler-user-agents.git"
```

In Node.js, you can `require` the package to get an array of crawler user agents.

```js
const crawlers = require('crawler-user-agents');
console.log(crawlers);
```

## Usage

Each `pattern` is a regular expression. It should work out-of-the-box wih your favorite regex library:

* JavaScript: `if (RegExp(entry.pattern).test(req.headers['user-agent']) { ... }`
* PHP: add a slash before and after the pattern: `ìf (preg_match('/'.$entry['pattern'].'/', $_SERVER['HTTP_USER_AGENT'])): ...`
* Python: `if re.search(entry['pattern'], ua): ...`

## Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

* contain a single addition
* specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")
* contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot
* result in a valid JSON file (don't forget the comma between items)

Example:

    {
      "pattern": "rogerbot",
      "addition_date": "2014/02/28",
      "url": "http://moz.com/help/pro/what-is-rogerbot-",
      "instances" : ["rogerbot/2.3 example UA"]
    }

## License

The list is under a [MIT License](https://opensource.org/licenses/MIT). The versions prior to Nov 7, 2016 were under a [CC-SA](http://creativecommons.org/licenses/by-sa/3.0/) license.

## Related work

If you are using Ruby, [Voight-Kampff](https://github.com/biola/Voight-Kampff) and [isbot](https://github.com/Hentioe/isbot) provide  libraries for accessing this data.

Other systems for spotting robots, crawlers, and spiders that you may want to consider include [isBot](https://github.com/gorangajic/isbot) (Node.JS), [Crawler-Detect](https://github.com/JayBizzle/Crawler-Detect) (PHP), [BrowserDetector](https://github.com/mimmi20/BrowserDetector) (PHP), and [browscap](https://github.com/browscap/browscap) (JSON files).