Tag

lang-en

ai-ml

Evaluating and Benchmarking Pentest Agents

If you can't measure an offensive agent repeatably, you're doing demos, not engineering. Our harness: success rate, cost per flag and scope adherence.

may. 29, 2026 1 min
ai-ml

Persistent Memory for Offensive Agents

A pentest does not fit in a context window. How we give our agents operational memory that survives across phases without leaking scope.

may. 29, 2026 1 min
rag

Graph RAG and MITRE ATT&CK

Flat RAG retrieves paragraphs; an operation thinks in relationships. We wired Beorn to an ATT&CK knowledge graph to decide the next move.

may. 29, 2026 1 min
ai-ml

ReAct Prompting for Kill Chain Orchestration

Why a single LLM cannot run an entire pentest end-to-end, and how we extended the Thought-Action-Observation loop to coordinate agents in Gandalf CLI.

may. 15, 2026 mins
engineering

Seccomp-bpf for Autonomous Agents

How we built minimal seccomp-bpf profiles so that the exploits an LLM runs don't turn into an accidental rm -rf on the host.

may. 15, 2026 mins
engineering

Observability in Offensive Operations

Why a red team without traces is indefensible: how we instrument every decision of our agent with OpenTelemetry, eBPF and spans mapped to MITRE ATT&CK.

may. 15, 2026 mins
ai-ml

Tactical RAG: From Writeups to Action

Indexing 9115 HTB writeups isn't building a search engine: it's giving operational memory to an agent in the middle of an exploit. Here's what we learned.

may. 15, 2026 mins