Abstract
We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment.
Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that:
- extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants,
- aligns it with a Target model through a low-rank linear transformation, and
- applies it at inference time to elicit the behavior.
Experiments on reasoning behaviors — including Chain-of-Thought (CoT) and mathematical reasoning — demonstrate substantial improvements across model scales without training.