HatCat - The little model under the hat

See what your LLM is thinking_

Real-time concept detection and steering for language models.
Catch deception before it speaks.

VOOM > FOOM

scroll

The Problem

Black Box

Black Box

LLMs are opaque. We can't see what they're doing internally.

Deception

Deception

Models can think one thing and write another. Current methods can't detect this.

Hope & Pray

Hope & Pray

Safety approaches measure outcomes after the fact, or blunt capability with guardrails.

The HatCat Approach

Make the model's internal state observable and steerable, then build governance on top.

HAT
HAT Headspace Ambient Transducer

Reads activations through concept lenses. Always on, minimal overhead.

Substrate
SUBSTRATE The Model Wearing the Hat

Any open-weights LLM. The raw activations we're observing.

CAT
CAT Conjoined Adversarial Tomograph

Detects divergence between what the model thinks vs. writes.

How It Works

01

Train Concept Lenses

Small classifiers learn to recognize concepts in a model's hidden states. We start from WordNet/SUMO ontologies and let the model define each concept in its own words.

02

Monitor in Real-Time

As the model generates text, every lens watches activations and reports concept intensity. Like an EEG for AI thoughts—8,000+ concepts at <25ms latency.

03

Detect Divergence

Compare what the model is thinking (activation lenses) vs. what it's writing (text lenses). When they disagree significantly: deception detected.

04

Steer if Needed

The same directions used for detection can suppress or amplify concepts. Manifold-aware steering with falloff through surrounding layers.

Capabilities

Detect

Monitor 8,000+ concepts across multiple layers of any open-weights model

Visualize

See concept activations in real-time as text is generated

Steer

Suppress dangerous behaviors with sub-token latency

Govern

Build auditable, treaty-compliant AI systems

~8,000 Concepts
<1GB VRAM
<25ms Latency
94% Suppression

Ready to look under the hat?

HatCat is open source. CC0 code, train your own lenses, build your own perspective.