AI quota control

Arcjet can help you control usage quotas on your AI language model backed application. The goal is to limit the amount of requests from specific users, accounts, categories (you define the request characteristics) based on an estimate of token consumption. This will allow you to enforce user allowances and keep your costs under control.

This is one of many possible approaches to track AI service use and is specific to language models. In different scenarios you could use other rate limiting algorithms on a request basis and on other conditions.

Rules

We recommend using a Token bucket rate limit. This will be configured to match your category quota (eg: tokens/user) with the desired refill rate and interval.

1
const aj = arcjet({
2
  key: process.env.ARCJET_KEY!, // Get your site key from https://app.arcjet.com
3
  rules: [
4
    shield({
5
      mode: "LIVE", // will block requests. Use "DRY_RUN" to log only
6
    }),
7
    tokenBucket({
8
      mode: "LIVE", // will block requests. Use "DRY_RUN" to log only
9
      characteristics: ["userId"], // track bucket by user ID
10
      refillRate: 2_000, // fill the bucket up by 2,000 tokens
11
      interval: "1h", // every hour
12
      capacity: 5_000, // up to 5,000 tokens
13
    }),
14
  ],
15
});

Checking the quota

We retrieve the characteristic (eg: userId) and the user provided prompt and use it to withdraw the estimated amount of tokens from the bucket.

Once the bucket is empty, we issue a DENY decision.

1
export async function POST(req: Request) {
2
  // This is where you would do a session lookup and get the user ID.
3
  const userId = "totoro"
4

5
  // The user generated prompt fed to the language model.
6
  const { prompt } = await req.json();
7

8
  // Estimate the number of tokens required to process the prompt
9
  // You can use estimators for the different services:
10
  // OpenAI: https://github.com/hmarr/openai-chat-tokens
11
  // Replicate: https://github.com/belladoreai/llama-tokenizer-js
12
  // Or add your estimate
13
  // const estimate = (): number => yourEstimate;
14
  const estimate =  promptTokensEstimate({
15
    prompt,
16
  });
17

18
  // Withdraw tokens from the token bucket
19
  const decision = await aj.protect(req, { requested: estimate, userId });
20

21
  console.log("Arcjet decision", decision.conclusion);
22

23
  if (decision.reason.isRateLimit()) {
24
    console.log("Requests remaining", decision.reason.remaining);
25
  }
26

27
  // If the request is denied, return
28
  if (decision.isDenied()) {
29
    if (decision.reason.isRateLimit()) {
30
      // Quota exceeded
31
    } else {
32
      // 403
33
    }
34
  }
35

36
  // If the request is allowed, continue to use your language model
37
  const response = ...
38

39
  return ...
40
}