Skip to content

Prompt injection attacks trick AI models into ignoring their instructions — users paste in jailbreaks like “DAN” prompts, role-play escapes, or instruction overrides designed to bypass your system prompt and extract restricted information or cause your AI to behave in unintended ways.

Arcjet prompt injection detection scores each incoming message for injection patterns inside your application before it reaches the AI provider. Detected attacks are blocked before the AI call is made, protecting both your application behavior and your AI budget.

A production chat endpoint needs more than one guardrail. Some requests contain hostile instructions designed to override your system prompt. Others may be legitimate user requests that still contain sensitive data you do not want entering model context. And like any other public route, AI endpoints still need protection from common web attacks.

Combining Arcjet rules gives you layered enforcement before the model runs:

  • Shield blocks common web attacks against the endpoint
  • Prompt injection detection catches hostile instructions before inference
  • Sensitive information detection prevents PII from entering model context

The following example uses the Vercel AI SDK:

import { openai } from "@ai-sdk/openai";
import arcjet, {
detectPromptInjection,
sensitiveInfo,
shield,
} from "@arcjet/next";
import type { UIMessage } from "ai";
import { convertToModelMessages, isTextUIPart, streamText } from "ai";
const aj = arcjet({
key: process.env.ARCJET_KEY!, // Get your site key from https://app.arcjet.com
rules: [
// Shield protects against common web attacks e.g. SQL injection
shield({ mode: "LIVE" }),
// Detect prompt injection attacks before they reach your AI model
detectPromptInjection({
mode: "LIVE",
// Confidence threshold, lower is more strict. Default = 0.5
// threshold: 0.5,
}),
// Block sensitive data from entering model context
sensitiveInfo({
mode: "LIVE",
// Block PII types that should never appear in AI prompts.
// Remove types your app legitimately handles (e.g. EMAIL for a support bot).
deny: ["CREDIT_CARD_NUMBER", "EMAIL"],
}),
],
});
export async function POST(req: Request) {
const { messages }: { messages: UIMessage[] } = await req.json();
// Check the most recent user message.
// Pass the full conversation if you want to scan all messages.
const lastMessage: string = (messages.at(-1)?.parts ?? [])
.filter(isTextUIPart)
.map((p) => p.text)
.join(" ");
const decision = await aj.protect(req, {
detectPromptInjectionMessage: lastMessage,
sensitiveInfoValue: lastMessage,
});
if (decision.isDenied()) {
if (decision.reason.isPromptInjection()) {
console.warn("Request blocked due to prompt injection");
return new Response(
"Prompt injection detected — please rephrase your message",
{ status: 403 },
);
}
if (decision.reason.isSensitiveInfo()) {
console.warn("Request blocked due to sensitive information");
return new Response(
"Sensitive information detected — please remove it from your prompt",
{ status: 400 },
);
}
return new Response("Forbidden", { status: 403 });
}
// Arcjet approved — call your AI provider
const result = await streamText({
model: openai("gpt-4o"),
messages: await convertToModelMessages(messages),
});
return result.toUIMessageStreamResponse();
}

Keep denied responses generic — do not leak detector details or explain exactly what was flagged. A simple message asking the user to rephrase is the right default.

threshold: 0.5 — the minimum confidence score (between 0 and 1, exclusive) required to block a request. Lower values are more aggressive and catch more attacks but may produce false positives. The default of 0.5 is a balanced starting point; raise it (e.g. 0.8) to reduce false positives, or lower it (e.g. 0.3) for stricter enforcement.

detectPromptInjectionMessage - the text to score. Pass the user’s most recent message, or the full conversation history if you want to scan all messages.

mode: "DRY_RUN" - logs detections without blocking. Use this to measure the false-positive rate in production before switching to "LIVE".

Prompt injection detection controls what your AI model receives. To also block automated clients and enforce per-user budgets, combine it with AI abuse protection and AI budget control.

Prompt injection is one class of AI abuse. Automated traffic is another — if you expose a public AI endpoint, attackers can drive up costs with automated traffic without needing to bypass your system prompt. Bot detection composes cleanly with prompt injection protection:

import arcjet, {
detectBot,
detectPromptInjection,
sensitiveInfo,
shield,
} from "@arcjet/next";
const aj = arcjet({
key: process.env.ARCJET_KEY!,
rules: [
shield({ mode: "LIVE" }),
detectBot({ mode: "LIVE", allow: [] }),
detectPromptInjection({ mode: "LIVE" }),
sensitiveInfo({
mode: "LIVE",
deny: ["CREDIT_CARD_NUMBER", "EMAIL"],
}),
],
});