Skip to content

Add Guardrails

Guardrails validate inputs before they reach the LLM and outputs before they reach the user.

Input guardrails

yaml
spec:
  guardrails:
    input:
      - type: prompt-injection
        action: reject
      - type: topic-filter
        topics: [violence, self-harm]
        action: reject
      - type: pii-detector
        action: redact

Output guardrails

yaml
spec:
  guardrails:
    output:
      - type: toxicity-filter
        threshold: 0.7
        action: reject
      - type: hallucination-detector
        action: flag

Guardrail types

TypeApplied toAction options
prompt-injectionInputreject, flag
topic-filterInputreject, flag
pii-detectorInput / Outputreject, redact, flag
toxicity-filterOutputreject, flag
hallucination-detectorOutputreject, flag

Actions

ActionBehaviour
rejectBlock the message and return an error
redactRemove the violating content and continue
flagLog a warning and continue

See also

Released under the Apache 2.0 License.