Skip to content

AI quota control

Arcjet can help you control usage quotas on your AI language model backed application. The goal is to limit the amount of requests from specific users, accounts, categories (you define the request characteristics) based on an estimate of token consumption. This will allow you to enforce user allowances and keep your costs under control.

This is one of many possible approaches to track AI service use and is specific to language models. In different scenarios you could use other rate limiting algorithms on a request basis and on other conditions.

Rules

We recommend using a Token bucket rate limit. This will be configured to match your category quota (eg: tokens/user) with the desired refill rate and interval.

const aj = arcjet({
key: process.env.ARCJET_KEY!, // Get your site key from https://app.arcjet.com
characteristics: ["userId"], // track requests by user ID
rules: [
shield({
mode: "LIVE", // will block requests. Use "DRY_RUN" to log only
}),
tokenBucket({
mode: "LIVE", // will block requests. Use "DRY_RUN" to log only
refillRate: 2_000, // fill the bucket up by 2,000 tokens
interval: "1h", // every hour
capacity: 5_000, // up to 5,000 tokens
}),
],
});

Checking the quota

We retrieve the characteristic (eg: userId) and the user provided prompt and use it to withdraw the estimated amount of tokens from the bucket.

Once the bucket is empty, we issue a DENY decision.

export async function POST(req: Request) {
// This is where you would do a session lookup and get the user ID.
const userId = "totoro"
// The user generated prompt fed to the language model.
const { prompt } = await req.json();
// Estimate the number of tokens required to process the prompt
// You can use estimators for the different services:
// OpenAI: https://github.com/hmarr/openai-chat-tokens
// Replicate: https://github.com/belladoreai/llama-tokenizer-js
// Or add your estimate
// const estimate = (): number => yourEstimate;
const estimate = promptTokensEstimate({
prompt,
});
// Withdraw tokens from the token bucket
const decision = await aj.protect(req, { requested: estimate, userId });
console.log("Arcjet decision", decision.conclusion);
if (decision.reason.isRateLimit()) {
console.log("Requests remaining", decision.reason.remaining);
}
// If the request is denied, return
if (decision.isDenied()) {
if (decision.reason.isRateLimit()) {
// Quota exceeded
} else {
// 403
}
}
// If the request is allowed, continue to use your language model
const response = ...
return ...
}