#1 New LLM Trend: Streaming Experts Technique
The "Streaming Experts" technique allows large Mixture-of-Experts (MoE) models to run on hardware with limited RAM by streaming necessary expert weights from SSDs per token. Recent experiments have successfully run the 1 trillion parameter Kimi K2.5 model on a MacBook Pro with 96GB of RAM, and even the Qwen3.5-397B model on an iPhone. This optimization, driven by ongoing community research, is significantly lowering the hardware requirements for running massive local LLMs.