The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

Rishab Balasubramanian · Pin-Jie Lin · Rituraj Sharma · Anjie Fang · Fardin Abdi · Viktor Rozgic · Zheng Du · Mohit Bansal · Tu Vu

arXiv PDF

Abstract

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment.

Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that:

extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants,
aligns it with a Target model through a low-rank linear transformation, and
applies it at inference time to elicit the behavior.

Experiments on reasoning behaviors — including Chain-of-Thought (CoT) and mathematical reasoning — demonstrate substantial improvements across model scales without training.