Skip to content

Protect your OpenAI application with Arcjet

If you are building an AI application using OpenAI then you will want to protect it from abuse. Arcjet rate limiting and bot protection can help manage your OpenAI developer token budget.

What is Arcjet? Arcjet helps developers protect their apps in just a few lines of code. Implement rate limiting, bot protection, email validation, and defense against common attacks.

Example use case

  • You have a chat interface that uses OpenAI to generate responses.
  • You want to prevent automated bots from accessing your application.
  • You want to implement a rate limit for each user logged in to your application.
  • The rate limit should be based on the OpenAI tokens, which is how you are billed for your usage of the OpenAI API.

How it works

  • Arcjet rate limits allow custom characteristics to identify the client and apply the limit. We provide the user ID to identify the user. This can use any authentication system you have in place, such as Clerk.
  • Define a rate limit of 2,000 tokens per hour with a maximum of 5,000 tokens in the bucket. This allows for a reasonable conversation length without consuming too many tokens.
  • Also apply a bot rule to block clients we are sure are automated.
  • Use the openai-chat-tokens package to count the number of tokens in each chat API request.
  • Pass the token estimate to the Arcjet protect call to deduct the tokens from the user’s rate limit.

The example below shows the API route for a Next.js application with a gpt-3.5-turbo AI chatbot. See the full example Next.js implementation on GitHub.

/app/api/chat/route.ts
import arcjet, { detectBot, tokenBucket } from "@arcjet/next";
import { OpenAIStream, StreamingTextResponse } from "ai";
import OpenAI from "openai";
import { promptTokensEstimate } from "openai-chat-tokens";
const aj = arcjet({
key: process.env.ARCJET_KEY!, // Get your site key from https://app.arcjet.com
characteristics: ["userId"], // track requests by user ID
rules: [
tokenBucket({
mode: "LIVE", // will block requests. Use "DRY_RUN" to log only
refillRate: 2_000, // fill the bucket up by 2,000 tokens
interval: "1h", // every hour
capacity: 5_000, // up to 5,000 tokens
}),
detectBot({
mode: "LIVE",
allow: [], // "allow none" will block all detected bots
}),
],
});
// OpenAI client
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY ?? "OPENAI_KEY_MISSING",
});
// Edge runtime allows for streaming responses
export const runtime = "edge";
export async function POST(req: Request) {
// This userId is hard coded for the example, but this is where you would do a
// session lookup and get the user ID.
const userId = "totoro";
const { messages } = await req.json();
// Estimate the number of tokens required to process the request
const estimate = promptTokensEstimate({
messages,
});
console.log("Token estimate", estimate);
// Withdraw tokens from the token bucket
const decision = await aj.protect(req, { requested: estimate, userId });
console.log("Arcjet decision", decision.conclusion);
if (decision.reason.isRateLimit()) {
console.log("Requests remaining", decision.reason.remaining);
}
// If the request is denied, return a 429
if (decision.isDenied()) {
if (decision.reason.isRateLimit()) {
return new Response("Too Many Requests", {
status: 429,
});
} else {
// Bots will see this response
return new Response("Forbidden", {
status: 403,
});
}
}
// If the request is allowed, continue to use OpenAI
// Ask OpenAI for a streaming chat completion given the prompt
const response = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
stream: true,
messages,
});
// Convert the response into a friendly text-stream
const stream = OpenAIStream(response);
// Respond with the stream
return new StreamingTextResponse(stream);
}

Discussion