The authors propose distilling large-scale Transformer models into hybrid linear RNNs like Mamba, preserving impressive generative capabilities while significantly enhancing efficiency. This approach combines the strengths of both Transformers and linear RNNs to create models that are powerful yet highly efficient. The authors demonstrate the effectiveness of this method through experiments on various benchmarks, including the OpenLLM Leaderboard, showing that the distilled hybrid models outperform open-source models in terms of performance and efficiency. Speculative decoding is also proposed as a means to accelerate inference speed for these models.