Skip to content

AI providers bill per token. Without per-user limits, a single user can exhaust your entire monthly budget - through prompt attacks, runaway loops, or just heavy legitimate use.

A token bucket rate limit maps directly onto how AI billing works: you estimate the cost of each request in tokens, deduct it from the user’s bucket, and deny requests when the bucket is empty. The bucket refills over time, giving each user a sustained allowance without sharp rate-limit cliffs.

Alternatively, you can use a fixed window or sliding window limit to enforce a hard cap on spend per user per day, week, or month. See the rate limiting algorithms reference for details on different approaches.

Arcjet handles bucket state across all instances of your application - no Redis or external state store required.

characteristics: ["userId"] - Tracks the bucket per user. Replace "userId" with the characteristic that identifies a unique user in your application (e.g. a session token, API key, or authenticated user ID). Pass the value to aj.protect() as a named argument.

refillRate and interval - Set the sustained allowance. refillRate: 2_000, interval: "1h" gives each user 2,000 tokens per hour. Adjust to match your AI provider’s pricing and your cost targets. These are hard coded in this example, but you can also calculate them dynamically based on user subscription level or other factors. Just pass in the calculated values to the rule.

capacity - The maximum tokens a user can accumulate. Setting capacity: 5_000 with refillRate: 2_000 lets users burst up to 5,000 tokens if they haven’t used their allowance recently.

The example uses a characters / 4 heuristic (~1 token per 4 characters for common English text). This is a reasonable starting point — it avoids introducing extra dependencies and works well enough for budget enforcement where a small margin of error is acceptable.

For accurate counts, use a tokenizer: