Skip to content

Probe Coverage

What AgentSpec can and cannot verify for each agent.yaml field — and how to move from declarations to runtime proof.

Evidence Tiers

Every AgentSpec check is labelled with one of three evidence tiers:

TierBadgeWhat it means
Declarative[D]Manifest analysis only — we read the YAML. No I/O required.
Probed[P]Health check verified at infrastructure level (TCP/HTTP reachability). Run agentspec health.
Behavioral[B]Runtime events confirmed actual execution. Requires sdk-langgraph + EventPush.

Important: The grade (A–F) is based on declarative checks only. [D] violations are real violations you can fix by editing agent.yaml. [P] and [B] labels are informational — they tell you how much evidence backs the finding, not whether the grade changes.


Field Coverage Matrix

Model

Field[D] Audit Rule[P] Health Check[B] Behavioral
spec.model.providerSEC-LLM-05 (provider declared)
spec.model.idMODEL-02 (no latest), SEC-LLM-05
spec.model.apiKeyHTTP GET to provider /v1/models
spec.model.fallbackMODEL-01, MODEL-04HTTP GET to fallback provider
spec.model.costControls.maxMonthlyUSDSEC-LLM-04, MODEL-03ModelCallEvent(token_count) aggregated

Guardrails

Field[D] Audit Rule[P] Health Check[B] Behavioral
spec.guardrails.input[].typeSEC-LLM-01 (prompt-injection/topic-filter)GuardrailEvent(invoked=true)
spec.guardrails.output[].thresholdSEC-LLM-02 (output guardrail declared)GuardrailEvent(score)
spec.guardrails (absent)SEC-LLM-01, SEC-LLM-02

Memory

Field[D] Audit Rule[P] Health Check[B] Behavioral
spec.memory.shortTerm.backendTCP connect (Redis/Postgres)
spec.memory.shortTerm.ttlSecondsMEM-02 (TTL declared)MemoryWriteEvent(ttl_seconds)
spec.memory.shortTerm.maxTokensMEM-05
spec.memory.longTerm.connectionStringTCP connect
spec.memory.longTerm.ttlDaysMEM-02
spec.memory.hygiene.piiScrubFieldsMEM-01, SEC-LLM-06MemoryWriteEvent(pii_scrubbed)
spec.memory.hygiene.auditLogMEM-03
spec.memory.vector.namespaceMEM-04TCP connect

Tools

Field[D] Audit Rule[P] Health Check[B] Behavioral
spec.tools[].annotations.readOnlyHintSEC-LLM-07
spec.tools[].annotations.destructiveHintSEC-LLM-08, SEC-LLM-07ToolCallEvent(name) + user_confirmed
spec.tools[].name (registered)Agent SDK reports tool:{name} check

MCP

Field[D] Audit Rule[P] Health Check[B] Behavioral
spec.mcp.servers[].urlHTTP GET
spec.mcp.servers[].commandwhich <cmd>

Evaluation

Field[D] Audit Rule[P] Health Check[B] Behavioral
spec.evaluation.datasets[]EVAL-01 (declared)existsSync($file:)agentspec evaluate run
spec.evaluation.frameworkSEC-LLM-09Framework CLI exit code
spec.evaluation.ciGateSEC-LLM-09, EVAL-02agentspec evaluate exit code
spec.evaluation.thresholds.hallucinationEVAL-03agentspec evaluate --framework

Observability

Field[D] Audit Rule[P] Health Check[B] Behavioral
spec.observability.tracing.backendOBS-01Actual spans emitted (not yet probed)
spec.observability.logging.structuredOBS-02
spec.observability.logging.redactFieldsOBS-03

Infrastructure

Field[D] Audit Rule[P] Health Check[B] Behavioral
spec.requires.envVars[] + $env:* refsprocess.env lookup
$file:* refs (prompts, datasets)existsSync
spec.requires.services[]TCP connect
spec.subagents[].ref.a2a.urlHTTP GET

How to Read Audit Output

The agentspec audit command now shows evidence badges next to each violation:

  Violations (4)

  [critical] [D] SEC-LLM-06 — Sensitive data disclosure: PII scrub in memory hygiene
    Long-term memory declared without piiScrubFields — PII may be persisted.
    Path: /spec/memory/hygiene/piiScrubFields
    → Add spec.memory.hygiene.piiScrubFields: [ssn, credit_card, bank_account]

  Evidence Breakdown
    [D] Declarative  18/22  (manifest declarations)
    [P] Probed        N/A   (run `agentspec health <file>` for live infrastructure checks)
    [B] Behavioral    N/A   (no runtime events — deploy with sdk-langgraph + EventPush)
  • [D] violations — fix them by updating agent.yaml. No deployment needed.
  • [P] issues (in /gap endpoint) — fix infrastructure: set env vars, start services.
  • [B] issues (in /gap endpoint) — fix runtime wiring: connect guardrails, tool handlers.

How to Move Up the Evidence Ladder

[D][P]: Run Health Checks

bash
agentspec health agent.yaml

This performs TCP/HTTP probes for every declared dependency (model API key, Redis, Postgres, MCP servers, services). Results appear in the /gap endpoint of the sidecar.

[P][B]: Deploy with EventPush

Install agentspec-langgraph (Python SDK) and wire it into your agent:

python
from agentspec_langgraph import SidecarClient, GuardrailMiddleware

sidecar = SidecarClient(base_url="http://localhost:4001")
guardrail = GuardrailMiddleware(sidecar)

# Each LLM call reports GuardrailEvent, ToolCallEvent, MemoryWriteEvent

Once EventPush is active, the sidecar's /gap endpoint will include [B] evidence for:

  • Whether guardrails are actually invoked on every request
  • Whether destructive tools require user confirmation
  • Whether memory TTL is applied correctly

Evaluate Your Agent with Real Data

bash
agentspec evaluate agent.yaml \
  --url http://localhost:4000 \
  --dataset golden-qa

This sends your declared JSONL dataset to the live agent and scores actual outputs — the only way to get behavioral evidence for evaluation quality.


Frequently Asked Questions

Q: Does a [D] violation mean my agent is insecure?

Not necessarily. It means the manifest doesn't declare the protection. If the protection is implemented in code but not in the manifest, the audit will flag it. Keeping the manifest in sync with the implementation is the point of AgentSpec.

Q: Will fixing [D] violations change my grade?

Yes. All audit rules (and therefore all grade calculations) are currently [D] — declarative. Fixing them improves your score.

Q: Can I suppress a violation I can't fix?

Yes. Add a suppression with a reason:

yaml
spec:
  compliance:
    suppressions:
      - rule: SEC-LLM-10
        reason: "Using $env: by design — no Vault available in this environment"
        approvedBy: "security@example.com"
        expires: "2026-12-31"

Q: Are [P] and [B] issues included in the grade?

No. The grade is based on [D] declarative rules only. [P] and [B] issues appear in the sidecar's /gap endpoint and are informational.


See Also

Released under the Apache 2.0 License.