lang-en - Berialabs

ai-ml

Evaluating and Benchmarking Pentest Agents

If you can't measure an offensive agent repeatably, you're doing demos, not engineering. Our harness: success rate, cost per flag and scope adherence.

may. 29, 2026 • 1 min

sandboxing

gVisor vs seccomp-bpf: Sandboxing Offensive Tools

Letting an agent run exploits demands serious isolation. When seccomp-bpf is enough, and when we stack gVisor on top.

may. 29, 2026 • 1 min

ai-ml

Persistent Memory for Offensive Agents

A pentest does not fit in a context window. How we give our agents operational memory that survives across phases without leaking scope.

may. 29, 2026 • 1 min

rag

Graph RAG and MITRE ATT&CK

Flat RAG retrieves paragraphs; an operation thinks in relationships. We wired Beorn to an ATT&CK knowledge graph to decide the next move.

may. 29, 2026 • 1 min

ai-ml

ReAct Prompting for Kill Chain Orchestration

Why a single LLM cannot run an entire pentest end-to-end, and how we extended the Thought-Action-Observation loop to coordinate agents in Gandalf CLI.

may. 15, 2026 • mins

engineering

Seccomp-bpf for Autonomous Agents

How we built minimal seccomp-bpf profiles so that the exploits an LLM runs don't turn into an accidental rm -rf on the host.

may. 15, 2026 • mins

engineering

Observability in Offensive Operations

Why a red team without traces is indefensible: how we instrument every decision of our agent with OpenTelemetry, eBPF and spans mapped to MITRE ATT&CK.

may. 15, 2026 • mins

red-team

Secure Tool Calling in Air-Gapped Environments

How a kill-switch, a seccomp-bpf filter, and CIDR rules cut off the silent leak of an LLM agent in a lab with no internet. Lessons from the field.

may. 15, 2026 • mins

ai-ml

Reinforcement Learning for Exploit Generation

We train a PPO agent to turn crashes into control flow hijacking. Rewards with eBPF, honest failures and real code. What we learned along the way.

may. 15, 2026 • mins

ai-ml

Tactical RAG: From Writeups to Action

Indexing 9115 HTB writeups isn't building a search engine: it's giving operational memory to an agent in the middle of an exploit. Here's what we learned.

may. 15, 2026 • mins

ai-ml

Multi-Agent Debate for Vulnerability Triage

Why we make three agents (critical, evidential, and technical) debate each finding before closing it, with real metrics and trade-offs.

may. 15, 2026 • mins

engineering

LLM-Guided Fuzzing: More Coverage, Fewer Silly Crashes

How we combined AFL++ with LLM-generated seeds in Gwaihir CLI to fuzz complex parsers without drowning in initial validation crashes.

may. 15, 2026 • mins