← NLA experimentsNatural Language Autoencoders · follow-up series
21

Interp combinations — corrected rerun

Wider labeled coefficient sweeps with degeneration tags; L40-vs-L42 site gap kept and caveated throughout; drop-comparison as the primary residual method; raw-residual enrichment reported as a labeled null.

experiment
021_interp_combos
date
2026-06-13
model
gemma-3-27b-it · NLA L41 AV
axes
13 contrast + 1 control
CaveatThe SAE (GemmaScope-2 L40 16k) is trained on hidden_states[41] — one block before the NLA extraction site (hidden_states[42]). This one-block site gap is uncorrected in exp 021; it depresses FVU on h (~0.44) and makes the raw-residual enrichment a null result. Every sub-page that involves the SAE repeats this caveat.SAE trained at layers.40 output = hidden_states[41]; the NLA reads hidden_states[42], one block downstream. This gap is NOT corrected in 021 — it is the honest reason fvu_h is elevated (~0.44 in 020) and the residual methods are mushy.
§0

Degeneration legend

Each Cell B mult-group and Cell C scale is tagged with a degeneration classification derived from token-level repetition metrics (token_max_run, uniq_ratio, rep4, char_max_run, cjk_ratio). Tags propagate from individual completions to the containing cell.

coherentNormal text — repetition metrics within baseline range.
degradingElevated repetition — starts to loop or echo; content partially present.
collapsedFully degenerate — token storm, CJK spam, or total repetition; content lost. Cells remain visible and labeled — never hidden.
§1

Battery snapshot

14
axes extracted
0.224
anger cos f2796
one-block gap depresses this
58385
median gold ‖h‖
14
axes named
§2

Residual sanity gate

0.44
FVU on h
~0.44 expected from site gap
0.39
FVU on ĥ
1554
FVU on r
noise-level — null result
1453
FVU on noise
§3

Sub-pages