#1 NVIDIA Star Elastic: Single Checkpoint Contains 30B, 23B, 12B Reasoning Models
NVIDIA researchers introduced Star Elastic, a post-training method that embeds multiple nested submodels (30B, 23B, and 12B) within a single parent reasoning model, all contained in one checkpoint from a single training run. This approach eliminates separate training, storage, and deployment for each model size. It utilizes importance estimation and a trainable router for architecture selection, supporting various nesting dimensions and enabling distinct models for reasoning and answering phases.