Abstract

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment.

source · small align · low-rank W target · large

Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that:

  1. extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants,
  2. aligns it with a Target model through a low-rank linear transformation, and
  3. applies it at inference time to elicit the behavior.

Experiments on reasoning behaviors — including Chain-of-Thought (CoT) and mathematical reasoning — demonstrate substantial improvements across model scales without training.