Prompt Injection Defense in Agentic Systems
How we defend Gandalf, Gwaihir and Beorn from payloads hidden in banners, writeups and DNS responses. Instruction hierarchy, Spotlighting, StruQ and our Sentinel.
How we defend Gandalf, Gwaihir and Beorn from payloads hidden in banners, writeups and DNS responses. Instruction hierarchy, Spotlighting, StruQ and our Sentinel.
How we applied Constitutional AI and RLAIF to Gandalf CLI so that our offensive agents reject out-of-scope actions on their own, without relying on manual prompt engineering.