Tag: residual connections
-
Inside DeepSeek’s Manifold-Constrained Hyper-Connections: How a Doubly Stochastic Trick Could Rewire LLM Scaling
DeepSeek’s new paper on Manifold-Constrained Hyper-Connections (mHC) proposes a mathematically disciplined way to stabilize and scale large language models by redesigning how residual pathways carry information through very deep networks. Instead of throwing more compute at bigger models, it attacks a core architectural weakness that has quietly limited how far standard and hyper-connected networks can…