Persistent Memory for Offensive Agents
A pentest does not fit in a context window. A real campaign spans recon, exploitation and post-exploitation over hours; if the agent forgets what it saw in phase 1 by the time it reaches phase 4, it repeats work, drops findings and makes worse calls. Persistent memory is what turns an LLM into an operator.
Three kinds of memory, not one
- Episodic: the immutable log of what ran, when, and with what output. The basis for traceability and replay.
- Working: the live campaign state — hosts, credentials, pivot paths, open hypotheses. Compacted and summarized between phases.
- Semantic: reusable knowledge (TTPs, writeups) served by Beorn via RAG, not target-specific.
Retrieve with discipline
The classic mistake is dumping all memory into every prompt. We retrieve by relevance and phase: the agent asks for "valid credentials for 10.0.0.0/24" and gets exactly that, with provenance. Every retrieved fact carries its source so the reasoning stays auditable.
memory.query({ scope: "10.0.0.0/24", kind: "credential",
phase: "lateral-movement", max_tokens: 800 })
Scoped memory, not a data leak
Memory is an exfiltration surface if left uncontrolled. Every write passes through Sentinel: nothing outside the approved scope is persisted, and the kill-switch purges working memory instantly. Episodic memory is kept encrypted for forensics; working memory is ephemeral by design.
What we ship
In Gandalf, persistent memory is first-class: each agent starts a phase with a reconstructed summary of the previous one, not a blank window. Less repeated work, context-aware decisions, and a full trail of why the agent did what it did.