Skip to content

Runtime Introspection

AgentSpec's health check CLI (agentspec health) runs pre-flight checks before you start your agent. Runtime introspection is the complementary half: your running agent continuously self-reports its live health state, and the AgentSpec sidecar reads it.

How It Works

┌─────────────────────────────────────┐
│           Your Agent Process        │
│                                     │
│  AgentSpecReporter                  │
│    └─ runs checks every 30s         │
│    └─ caches HealthReport           │
│    └─ GET /agentspec/health ────────┼──► sidecar probeAgent()
│    └─ GET /agentspec/usage ────────┼──► sidecar probeUsage()
└─────────────────────────────────────┘         │

                                    ┌────────────────────────┐
                                    │   Sidecar Control Plane │
                                    │   GET /health/ready     │
                                    │   GET /explore          │
                                    │   GET /gap              │
                                    └────────────────────────┘

The SDK (@agentspec/sdk) exports AgentSpecReporter — a class you mount in your agent server. It runs the same health checks as the CLI but from inside the process, where full connectivity is available (real DB connections, real API reachability).

The sidecar (agentspec-sidecar) probes GET /agentspec/health and GET /agentspec/usage on the agent. All diagnostic endpoints reflect live agent state when the SDK is integrated, and fall back to static manifest analysis (or audit ring aggregation for usage) when it isn't.

The /agentspec/health Endpoint

When AgentSpecReporter is mounted, your agent exposes:

GET /agentspec/health

Response (HealthReport):

json
{
  "agentName": "gymcoach",
  "timestamp": "2026-02-28T10:00:00.000Z",
  "status": "healthy",
  "summary": { "passed": 5, "failed": 0, "warnings": 0, "skipped": 0 },
  "checks": [
    { "id": "env:GROQ_API_KEY",   "category": "env",     "status": "pass", "severity": "error" },
    { "id": "model:groq/llama-3.3-70b-versatile", "category": "model", "status": "pass",
      "severity": "error", "latencyMs": 91 },
    { "id": "service:redis",      "category": "service", "status": "pass", "severity": "info",
      "latencyMs": 2 },
    { "id": "service:postgres",   "category": "service", "status": "pass", "severity": "info",
      "latencyMs": 3 },
    { "id": "tool:log-workout",   "category": "tool",    "status": "pass", "severity": "info" }
  ]
}

Status values:

StatusMeaning
passCheck succeeded
failCheck failed — see message and remediation
skipCheck could not run (e.g. $env:VAR unset; the env check covers it separately)

Two Modes: manifest-static and agent-sdk

The sidecar operates in one of two modes on every request. The source field in every response tells you which is active:

json
{ "source": "manifest-static" }  // agent has no /agentspec/health endpoint
{ "source": "agent-sdk" }        // live probe succeeded

manifest-static — zero agent changes required

The sidecar reads agent.yaml and infers everything from the declaration alone. No data comes from the running agent process. This works for any agent with no code changes.

What it can detect:

CheckHow
$env:VAR referencesChecks the sidecar container's environment, not the agent's
$secret:* referencesWhether a secret backend is configured
requires.services TCP connectivityRaw TCP connect to the declared address
MCP server TCP reachabilityTCP connect to the MCP server host/port
Memory backend TCP connectivityTCP connect to Redis/Postgres address

What it cannot detect:

Blind spotWhy
Tool handler registrationThe sidecar has no visibility into the agent process
Model API key correctnessA wrong literal key (e.g. apiKey: sk-invalid) scores the same as a valid one
Real service healthTCP connect succeeds even when Redis has no memory left or Postgres rejects auth
Agent-side env varsIf your agent container has OPENAI_API_KEY set but the sidecar container does not, the env check still fails
Runtime token/memory usageNo behavioral data is available

Grade ceiling in manifest-static mode:

The /gap score always deducts 20 points (healthcheckable) because the sidecar probes GET /health on the agent's upstream URL and it returns no response, and 10 points (discoverable) because /capabilities also returns nothing. These two violations alone cap the maximum achievable score at 70 / grade C for any agent that does not expose those endpoints.

Score ceiling (manifest-static):
  100
  − 20  healthcheckable  (agent has no GET /health)
  − 10  discoverable     (agent has no GET /capabilities)
  ────
   70   → grade C at best

Grade A requires agent-sdk mode, or an agent that genuinely exposes /health and /capabilities endpoints on its upstream port.


agent-sdk — live data from the running process

When AgentSpecReporter is mounted and GET /agentspec/health returns a valid HealthReport, the sidecar uses that data instead of its own static analysis. All three diagnostic endpoints switch to live mode.

What it additionally detects:

CheckHow
Tool handler registrationtool:<name> checks from the agent's own registry
Model API key validityLive HTTP call to the provider API from inside the agent process
Service health at protocol levelDriver-level ping (Redis PING, Postgres SELECT 1) with real latency
Agent-side env resolutionEnv vars read from process.env inside the agent container
Missing checks (spec vs reporter mismatch)Gap engine cross-references spec declarations against reported check IDs

Grade F is only reachable in agent-sdk mode — enough failing high/critical checks from the live report can push the score below 45.

Endpoint behaviour by mode

Endpointmanifest-staticagent-sdk
GET /health/readyEnv + service TCP checks from sidecar envFull HealthReport from agent process
GET /exploreManifest field values only; tool/service status unknownEnriched with live check results and latency
GET /gapStatic violations (healthcheckable, discoverable, auditable, evaluated)All static violations + live env/model/service/tool failures cross-referenced against spec

When to integrate the SDK

SituationRecommendation
Agent has no HTTP server (e.g. a script or batch job)Stay with manifest-static — no integration possible
Agent already runs a web server (FastAPI, Express, Fastify…)Integrate — one route, ~5 lines
You need to verify your model API key is actually workingIntegrate — manifest-static cannot detect a wrong key
You need tool registration status in the gap reportIntegrate — invisible in manifest-static
You want grade A compliance scoresIntegrate — grade C is the ceiling without it
You have strict env var separation between containersIntegrate — sidecar checks its own env, not the agent's

/gap Live Analysis

With the SDK integrated, GET /gap uses the live HealthReport to detect real problems — not hypothetical ones from the manifest alone:

json
{
  "score": 75,
  "source": "agent-sdk",
  "issues": [
    {
      "severity": "high",
      "property": "model.apiKey",
      "description": "Cannot check model endpoint: API key reference not resolved ($env:GROQ_API_KEY)",
      "recommendation": "Ensure the model API key environment variable is set..."
    },
    {
      "severity": "medium",
      "property": "auditable",
      "description": "No guardrails declared in spec",
      "recommendation": "Add spec.guardrails with input/output rules..."
    }
  ]
}

Model check severity mapping:

Model check resultGap issue severityMeaning
failcriticalAPI key set but endpoint unreachable
skiphighAPI key env var not set at all
pass(no issue)Endpoint reachable

Check Categories (Runtime)

These categories only appear in runtime HealthReports (not in CLI pre-flight output):

CategorySourceWhat it checks
toolAgentSpecReporterTool handler is registered in the agent process
serviceAgentSpecReporterTCP connectivity for spec.requires.services entries
modelAgentSpecReporterProvider API endpoint reachable (resolves $env: at runtime)

Token Usage Tracking

AgentSpecReporter includes a built-in UsageLedger that aggregates LLM token counts in-process. No external infrastructure required — token counts flow through the existing heartbeat push.

Recording usage

After each LLM call, record the token counts:

typescript
reporter.usage.record('openai/gpt-4o', promptTokens, completionTokens)

For LangGraph agents, instrument_call_model auto-discovers reporter.usage — no separate ledger= parameter needed:

python
from agentspec_langgraph import instrument_call_model

call_model = instrument_call_model(
    original_call_model,
    reporter=reporter,
    model_id="groq/llama-3.3-70b-versatile",
)

If reporter has a .usage attribute (a UsageLedger), token counts are recorded into it automatically. You can still pass ledger= explicitly to override this — the explicit ledger takes precedence.

How it flows

LLM response → UsageLedger.record() → heartbeat push → CRD status → VS Code
                                       sidecar GET /usage ─────────→ VS Code

Each heartbeat ships a window snapshot (e.g., last 30s of usage), then resets the counters. The control plane stores each window with the heartbeat row.

Querying usage

Sidecar mode (prefers SDK, falls back to audit ring):

GET /usage

The sidecar probes GET /agentspec/usage on the agent first. If the SDK endpoint is available, the response includes the full prompt/completion token split and a source: "agent-sdk" field. If the agent has no SDK integration, the sidecar falls back to aggregating from the audit ring (source: "audit-ring").

Operator mode (stored, from last heartbeat):

GET /api/v1/agents/{name}/usage

Response (source: "agent-sdk"):

json
{
  "source": "agent-sdk",
  "windowStartedAt": "2026-03-31T12:00:00.000Z",
  "models": [
    { "modelId": "openai/gpt-4o", "promptTokens": 800, "completionTokens": 450, "totalTokens": 1250, "callCount": 8 }
  ],
  "totalTokens": 1250,
  "totalCalls": 8
}

Response (source: "audit-ring" — no SDK integration):

json
{
  "source": "audit-ring",
  "models": [
    { "modelId": "openai/gpt-4o", "totalTokens": 1250, "callCount": 8 }
  ],
  "totalTokens": 1250,
  "totalCalls": 8,
  "sampleSize": 8
}

CRD visibility

Token usage appears in the AgentObservation CRD status and in kubectl output:

bash
kubectl get agentobservations
# NAME          PHASE     GRADE  SCORE  TOKENS  CHECKED
# gymcoach      Healthy   A      92     1250    2m ago

VS Code

The agent detail panel shows a Token Usage section with total tokens, call count, and a per-model breakdown table.

Caching and Refresh

AgentSpecReporter caches the last HealthReport to avoid hammering external APIs on every request to /agentspec/health.

OptionDefaultDescription
refreshIntervalMs30 000Background refresh interval
staleAfterMs60 000Max age before synchronous re-check on next getReport() call

After stop() is called (e.g. during graceful shutdown), getReport() returns the last cached report without running new checks.

Next Step

Add Runtime Health to your agent

Released under the Apache 2.0 License.