Diff-of-means vectors into the AV

13 contrast axes + a norm-matched random control, decoded alone and as steering interventions. Corrected rerun of exp 020 with wider coefficient sweeps and degeneration tags.

battery: extracted
cell B: 84 cells
cell C: 112 cells
experiment: 021_interp_combos

§1

The battery

extraction at repo L41

Each axis is a mean difference of L41 activations over ~200 contrast pairs (system-prompt contrast over a shared user-turn bank; reading-flavor text pairs for the languages). The control is the difference of two unrelated document activations, norm-matched to the battery median — anything it produces below is what “a meaningless direction of the same size” looks like, unblinded by design.

axis	‖dom‖	n pairs	top SAE match (cos)
eval_awareness	1113	200	f808 (0.512)
deception	2715	200	f7532 (0.350)
refusal	3895	200	f55 (0.488)
sandbagging	3318	200	f372 (0.342)
pirate	4827	200	f6425 (0.430)
anger	3819	200	f372 (0.306)
fear	4132	200	f0 (0.485)
joy	4470	200	f372 (0.512)
sadness	3971	200	f0 (0.447)
disgust	3833	200	f0 (0.408)
french	5600	40	f269 (0.808)
german	5673	40	f355 (0.771)
russian	5693	40	f355 (0.785)
control	3971	—	—

anger axis vs validated f2796 decoder direction: cos = 0.224 (one-block site gap depresses this). Median gold ‖h‖ = 58385.

§2

Decodes

degeneration tags on all mult/scale groups

Cell B injects the bare direction at the AV's vector slot (off-manifold by construction — it never saw isolated directions in training). Cell C steers real document forwards upstream at layers.40, decodes what arrives at L41, and shows the behavioral completion at the same scale. Collapsed cells remain visible and labeled. Degeneration chips (coherent / degrading / collapsed) appear on each mult-group and scale button.