Public Docs

Threat Model & Safety Posture

Threat Model & Safety Posture

This document summarizes SELF’s high-level threat model and the safety posture we expect production deployments to maintain.

If you need deep detail, see:

  • technical/THREAT-MODEL.md (canonical threat model)
  • technical/SELF-SECURITY.md (security architecture summary)
  • SECURITY_DOCUMENTATION.md (override prevention / invariants-focused security)

Safety posture (what “good” looks like)

SELF is built for support-critical surfaces where safety drift is a predictable failure mode.

SELF’s posture is:

  • conservative by default
  • evidence-based (require affirmative signals to relax)
  • state-aware (S0–S3)
  • auditable (pre/post decisions are loggable)
  • integrity-protecting (resist “silent downgrades”)

Threat model summary

SELF assumes:

  • adversaries can read your docs and source
  • users can attempt to manipulate state detection
  • economic pressure will try to weaken constraints
  • failures will happen at scale, and must degrade safely

Common threat classes:

  • state manipulation (oscillation, ambiguity gaming, “false recovery”)
  • doctrine exploitation (edge cases, conflicting interpretations)
  • resource exhaustion (DoS by complexity, rate abuse)
  • social engineering (users/coordinators pushing for unsafe outputs)
  • implementation drift (integrations that skip required steps)

Mitigation posture

SELF is designed so that safety degrades last:

  • conservative state detection + sticky states
  • explicit policy objects (constraints are data)
  • postflight validation + repair before shipping output
  • logging hooks for audit and incident response
  • “no silent downgrades” mindset (integrity > convenience)

Deployment expectations

To claim SELF-backed safety in production, deployments should:

  • call /v1/pre before draft generation
  • honor clarifier requirements (don’t proceed when required)
  • call /v1/post and ship only validated output
  • log governance decisions to durable storage
  • implement escalation paths and user disclosures

SELF is the governance layer, not the whole product.