HatCat - See What Your LLM Is Thinking

The Problem

Black Box

LLMs are opaque. We can't see what they're doing internally.

Deception

Models can think one thing and write another. Current methods can't detect this.

Hope & Pray

Safety approaches measure outcomes after the fact, or blunt capability with guardrails.

The HatCat Approach

Make the model's internal state observable and steerable, then build governance on top.

HAT Headspace Ambient Transducer

Reads activations through concept lenses. Always on, minimal overhead.

↓

SUBSTRATE The Model Wearing the Hat

Any open-weights LLM. The raw activations we're observing.

↓

CAT Conjoined Adversarial Tomograph

Detects divergence between what the model thinks vs. writes.

How It Works

Train Concept Lenses

Small classifiers learn to recognize concepts in a model's hidden states. We start from WordNet/SUMO ontologies and let the model define each concept in its own words.

Monitor in Real-Time

As the model generates text, every lens watches activations and reports concept intensity. Like an EEG for AI thoughts—8,000+ concepts at <25ms latency.

Detect Divergence

Compare what the model is thinking (activation lenses) vs. what it's writing (text lenses). When they disagree significantly: deception detected.

Steer if Needed

The same directions used for detection can suppress or amplify concepts. Manifold-aware steering with falloff through surrounding layers.

Capabilities

◉

Detect

Monitor 8,000+ concepts across multiple layers of any open-weights model

◎

Visualize

See concept activations in real-time as text is generated

⟳

Steer

Suppress dangerous behaviors with sub-token latency

⬡

Govern

Build auditable, treaty-compliant AI systems

~8,000 Concepts

<1GB VRAM

<25ms Latency

94% Suppression

Ready to look under the hat?

HatCat is open source. CC0 code, train your own lenses, build your own perspective.

View on GitHub Explore FTW →