Skip to content

Protect your OpenAI application with Arcjet

If you are building an AI application using OpenAI then you will want to protect it from abuse. Arcjet rate limiting and bot protection can help manage your OpenAI developer token budget.

What is Arcjet? Arcjet helps developers protect their apps in just a few lines of code. Bot detection. Rate limiting. Email validation. Attack protection. Data redaction. A developer-first approach to security.

Example use case

  • You have a chat interface that uses OpenAI to generate responses.
  • You want to prevent automated bots from accessing your application.
  • You want to implement a rate limit for each user logged in to your application.
  • The rate limit should be based on the OpenAI tokens, which is how you are billed for your usage of the OpenAI API.

How it works

  • Arcjet rate limits allow custom characteristics to identify the client and apply the limit. We provide the user ID to identify the user. This can use any authentication system you have in place, such as Clerk.
  • Define a rate limit of 2,000 tokens per hour with a maximum of 5,000 tokens in the bucket. This allows for a reasonable conversation length without consuming too many tokens.
  • Also apply a bot rule to block clients we are sure are automated.
  • Use the openai-chat-tokens package to count the number of tokens in each chat API request.
  • Pass the token estimate to the Arcjet protect call to deduct the tokens from the user’s rate limit.

The example below shows the API route for a Next.js application with a gpt-3.5-turbo AI chatbot. See the full example Next.js implementation on GitHub.

/app/api/chat/route.ts
// Adapted from https://sdk.vercel.ai/docs/getting-started/nextjs-app-router
import { openai } from "@ai-sdk/openai";
import arcjet, { shield, tokenBucket } from "@arcjet/next";
import { streamText } from "ai";
import { promptTokensEstimate } from "openai-chat-tokens";
const aj = arcjet({
// Get your site key from https://app.arcjet.com
// and set it as an environment variable rather than hard coding.
// See: https://nextjs.org/docs/app/building-your-application/configuring/environment-variables
key: process.env.ARCJET_KEY!,
characteristics: ["userId"], // track requests by user ID
rules: [
shield({
mode: "LIVE", // will block requests. Use "DRY_RUN" to log only
}),
tokenBucket({
mode: "LIVE", // will block requests. Use "DRY_RUN" to log only
refillRate: 2_000, // fill the bucket up by 2,000 tokens
interval: "1h", // every hour
capacity: 5_000, // up to 5,000 tokens
}),
],
});
// Allow streaming responses up to 30 seconds
export const maxDuration = 30;
// Edge runtime allows for streaming responses
export const runtime = "edge";
export async function POST(req: Request) {
// This userId is hard coded for the example, but this is where you would do a
// session lookup and get the user ID.
const userId = "totoro";
const { messages } = await req.json();
// Estimate the number of tokens required to process the request
const estimate = promptTokensEstimate({
messages,
});
console.log("Token estimate", estimate);
// Withdraw tokens from the token bucket
const decision = await aj.protect(req, { requested: estimate, userId });
console.log("Arcjet decision", decision.conclusion);
if (decision.reason.isRateLimit()) {
console.log("Requests remaining", decision.reason.remaining);
}
// If the request is denied, return a 429
if (decision.isDenied()) {
if (decision.reason.isRateLimit()) {
return new Response("Too Many Requests", {
status: 429,
});
} else {
return new Response("Forbidden", {
status: 403,
});
}
}
// If the request is allowed, continue to use OpenAI
const result = await streamText({
model: openai("gpt-4-turbo"),
messages,
});
return result.toDataStreamResponse();
}

Discussion