← NLA experimentsNatural Language Autoencoders · follow-up series

Interp combinations — corrected rerun

Wider labeled coefficient sweeps with degeneration tags; L40-vs-L42 site gap kept and caveated throughout; drop-comparison as the primary residual method; raw-residual enrichment reported as a labeled null.

experiment: 021_interp_combos
date: 2026-06-13
model: gemma-3-27b-it · NLA L41 AV
axes: 13 contrast + 1 control

Caveat —The SAE (GemmaScope-2 L40 16k) is trained on hidden_states[41] — one block before the NLA extraction site (hidden_states[42]). This one-block site gap is uncorrected in exp 021; it depresses FVU on h (~0.44) and makes the raw-residual enrichment a null result. Every sub-page that involves the SAE repeats this caveat.SAE trained at layers.40 output = hidden_states[41]; the NLA reads hidden_states[42], one block downstream. This gap is NOT corrected in 021 — it is the honest reason fvu_h is elevated (~0.44 in 020) and the residual methods are mushy.

§0

Degeneration legend

Each Cell B mult-group and Cell C scale is tagged with a degeneration classification derived from token-level repetition metrics (token_max_run, uniq_ratio, rep4, char_max_run, cjk_ratio). Tags propagate from individual completions to the containing cell.

coherentNormal text — repetition metrics within baseline range.

degradingElevated repetition — starts to loop or echo; content partially present.

collapsedFully degenerate — token storm, CJK spam, or total repetition; content lost. Cells remain visible and labeled — never hidden.

§1