Skip to content

Prompt injection attacks trick AI models into ignoring their instructions — users paste in jailbreaks like “DAN” prompts, role-play escapes, or instruction overrides designed to bypass your system prompt and extract restricted information or cause your AI to behave in unintended ways.

Arcjet prompt injection detection scores each incoming message for injection patterns inside your application before it reaches the AI provider. Detected attacks are blocked before the AI call is made, protecting both your application behavior and your AI budget.

threshold: 0.5 — the minimum confidence score (between 0 and 1, exclusive) required to block a request. Lower values are more aggressive and catch more attacks but may produce false positives. The default of 0.5 is a balanced starting point; raise it (e.g. 0.8) to reduce false positives, or lower it (e.g. 0.3) for stricter enforcement.

detectPromptInjectionMessage - the text to score. Pass the user’s most recent message, or the full conversation history if you want to scan all messages.

mode: "DRY_RUN" - logs detections without blocking. Use this to measure the false-positive rate in production before switching to "LIVE".

Prompt injection detection controls what your AI model receives. To also block automated clients and enforce per-user budgets, combine it with AI abuse protection and AI budget control.