Real-time concept detection and steering for language models.
Catch deception before it speaks.
VOOM > FOOM
LLMs are opaque. We can't see what they're doing internally.
Models can think one thing and write another. Current methods can't detect this.
Safety approaches measure outcomes after the fact, or blunt capability with guardrails.
Make the model's internal state observable and steerable, then build governance on top.
Reads activations through concept lenses. Always on, minimal overhead.
Any open-weights LLM. The raw activations we're observing.
Detects divergence between what the model thinks vs. writes.
Small classifiers learn to recognize concepts in a model's hidden states. We start from WordNet/SUMO ontologies and let the model define each concept in its own words.
As the model generates text, every lens watches activations and reports concept intensity. Like an EEG for AI thoughts—8,000+ concepts at <25ms latency.
Compare what the model is thinking (activation lenses) vs. what it's writing (text lenses). When they disagree significantly: deception detected.
The same directions used for detection can suppress or amplify concepts. Manifold-aware steering with falloff through surrounding layers.
Monitor 8,000+ concepts across multiple layers of any open-weights model
See concept activations in real-time as text is generated
Suppress dangerous behaviors with sub-token latency
Build auditable, treaty-compliant AI systems
HatCat is open source. CC0 code, train your own lenses, build your own perspective.