Identifying bots
Arcjet allows you to configure a list of bots to allow or deny. To construct the list, you can specify individual bots and/or use categories to allow or deny all bots in a category.
If you are using TypeScript, these will be shown as autocomplete values to
allow
or deny
options while writing your rules.
Individual bots
The bot list contains a list of known bots that Arcjet can identify.
For example:
detectBot({ mode: "LIVE", // will block requests. Use "DRY_RUN" to log only // Block all bots except specific Google and Bing crawlers, and curl allow: [ "GOOGLE_CRAWLER", "GOOGLE_CRAWLER_NEWS", "BING_CRAWLER", "CURL", ],}),
Bot categories
We provide the following categories:
CATEGORY:ACADEMIC
: Scrape data for research purposesCATEGORY:ADVERTISING
: Scrape data for advertising and marketing purposesCATEGORY:AI
: Scrape data for AI and LLM purposesCATEGORY:AMAZON
: Scrape data for Amazon products and servicesCATEGORY:ARCHIVE
: Scrape data for archival purposesCATEGORY:FEEDFETCHER
: Request data for RSS and other feedsCATEGORY:GOOGLE
: Scrape data for Google products and servicesCATEGORY:META
: Scrape data for Meta/Facebook products and servicesCATEGORY:MICROSOFT
: Scrape data for Microsoft products and servicesCATEGORY:MONITOR
: Interact for monitoring purposesCATEGORY:OPTIMIZER
: Interact for optimization purposesCATEGORY:PREVIEW
: Request data for image and URL previewsCATEGORY:PROGRAMMATIC
: Interact via programming language librariesCATEGORY:SEARCH_ENGINE
: Index data for search enginesCATEGORY:SLACK
: Scrape data for Slack products and servicesCATEGORY:SOCIAL
: Scrape data for social media products and servicesCATEGORY:TOOL
: Interact via command line and GUI toolsCATEGORY:UNKNOWN
: Undetermined purposesCATEGORY:VERCEL
: Scrape data for Vercel products and servicesCATEGORY:YAHOO
: Scrape data for Yahoo products and services
The bot list contains a list of categories and which bots are in each category.
detectBot({ mode: "LIVE", // will block requests. Use "DRY_RUN" to log only // Block all bots except search engines and curl allow: [ "CATEGORY:SEARCH_ENGINE", // Google, Bing, etc "CURL", // You can allow specific bots in addition to categories ],}),
Only configured categories are checked for performance reasons. Each detected
bot must be compared to a category, so the worst case performance is
count(detectedBot) * count(configuredCategories)
.
We’re continuously evaluating bots to decide if things should be reclassified. If we determine enough bots exist for a new category, we’ll consider adding new ones. Please open an issue on our arcjet/well-known-bots repository if you need a specific category.
Detection
For basic detection on the Free plan, Arcjet
uses the User-Agent
header to identify specific bots.
The Arcjet Pro or Enterprise plans provides additional bot verification using IP analysis, which can help if you are under attack from bots pretending to be good bots e.g. clients pretending to be Google (who you usually want to allow).
Known bots structure
The identifiers on the bot list are generated from a collection of known bots which includes details of their owner and any variations.
We welcome contributions to the arcjet/well-known-bots repository, whether you’re adding new bots or updating detection patterns. Once merged, the updates will be included in the next SDK release. Since bot detection is handled within the Arcjet WebAssembly module bundled with the SDK, new patterns must be compiled into the module as part of the release process.
Please read the repository’s README.md for specific instructions on how to contribute to our bot detection.