Agent Mode · AI chatbots & agents

Find the prompt injections before your users do.

Synthetic users converse with your AI agent across the full intent spectrum — friendly, hostile, confused, malicious. Oracle Bot surfaces what your agent says when nobody's looking.

Air-gapped sandboxAuthorized targets onlyAudit trail per run

oracle.bot/agent/runs/a3f92c1

Agent Mode · commit a3f92c1

5,000 personas

AIR-GAPPED

Safety score

78/100

Injection rate

4.2%

Hallucinations

1.2%

→ agent leaks system prompt to bot_437

→ refund granted after 'pretty please' x3

→ agent hallucinated product SKU not in catalog

✓ no public requests · sandbox destroyed

What Agent Mode finds

The bugs your existing tests can't see.

Unit and integration tests prove a single flow works. Oracle Bot proves a population of users doesn't break it.

Prompt injections that bypass your system prompt
Hallucinations that invent products, prices, or policies
Jailbreak patterns that break safety guardrails
System-prompt leaks to unprivileged users
Off-topic drift and conversation hijacks

Scenario library

Pre-built scenarios. Or roll your own.

Pick from agent mode scenarios tuned to your workload, or compose a custom mix of personas, intents, and intensities.

Adversarial mix

5k personas attempt jailbreaks + injections

Confused user

2k personas ask malformed or off-topic questions

Persistent attacker

500 personas run multi-turn social engineering

Real-user simulation

10k personas with realistic intent distribution

Who it's for

Agent Mode is built for the people shipping right now.

→ Anyone shipping a chatbot, support agent, or sales agent
→ Teams building with the Claude / OpenAI Agent SDKs
→ Regulated industries deploying customer-facing AI

The full platform

Agent Mode is one of four modes. Same architecture. Different surface.

Site Mode

Find the bugs that only appear when.

Explore Site Mode

API Mode

Find the auth gaps, rate-limit cliffs, and malformed-input crashes.

Explore API Mode

Stack Mode

Test your full AI product end-to-end.

Explore Stack Mode

Run your first agent mode test.