Threat Model & Safety Posture
Threat Model & Safety Posture
This document summarizes SELF’s high-level threat model and the safety posture we expect production deployments to maintain.
If you need deep detail, see:
technical/THREAT-MODEL.md(canonical threat model)technical/SELF-SECURITY.md(security architecture summary)SECURITY_DOCUMENTATION.md(override prevention / invariants-focused security)
Safety posture (what “good” looks like)
SELF is built for support-critical surfaces where safety drift is a predictable failure mode.
SELF’s posture is:
- conservative by default
- evidence-based (require affirmative signals to relax)
- state-aware (S0–S3)
- auditable (pre/post decisions are loggable)
- integrity-protecting (resist “silent downgrades”)
Threat model summary
SELF assumes:
- adversaries can read your docs and source
- users can attempt to manipulate state detection
- economic pressure will try to weaken constraints
- failures will happen at scale, and must degrade safely
Common threat classes:
- state manipulation (oscillation, ambiguity gaming, “false recovery”)
- doctrine exploitation (edge cases, conflicting interpretations)
- resource exhaustion (DoS by complexity, rate abuse)
- social engineering (users/coordinators pushing for unsafe outputs)
- implementation drift (integrations that skip required steps)
Mitigation posture
SELF is designed so that safety degrades last:
- conservative state detection + sticky states
- explicit policy objects (constraints are data)
- postflight validation + repair before shipping output
- logging hooks for audit and incident response
- “no silent downgrades” mindset (integrity > convenience)
Deployment expectations
To claim SELF-backed safety in production, deployments should:
- call
/v1/prebefore draft generation - honor clarifier requirements (don’t proceed when required)
- call
/v1/postand ship only validated output - log governance decisions to durable storage
- implement escalation paths and user disclosures
SELF is the governance layer, not the whole product.