2026-03-25 Digest

Tracked 355 · Curated 12

#1 New LLM Trend: Streaming Experts Technique

The "Streaming Experts" technique allows large Mixture-of-Experts (MoE) models to run on hardware with limited RAM by streaming necessary expert weights from SSDs per token. Recent experiments have successfully run the 1 trillion parameter Kimi K2.5 model on a MacBook Pro with 96GB of RAM, and even the Qwen3.5-397B model on an iPhone. This optimization, driven by ongoing community research, is significantly lowering the hardware requirements for running massive local LLMs.

8.7

#2 OpenAI Shutting Down Sora AI Video App

OpenAI is shutting down its Sora AI video application, according to reports from The Hollywood Reporter. This decision marks a significant shift in strategy for the company regarding its AI video generation tool, which had previously garnered substantial industry attention.

8.5

#3 LangSmith Fleet Introduces Two Types of Agent Authorization

Following the launch of LangSmith Fleet, the platform has introduced two distinct agent authorization types: Assistants, which operate on-behalf-of end users using their own credentials, and Claws, which utilize fixed, independent credentials. By categorizing agents into these two modes, LangSmith Fleet enables more flexible management and secure deployment across various channels. The system also emphasizes the importance of human-in-the-loop guardrails when using fixed-credential agents to ensure safety for sensitive operations.

8.2

#4 LiteLLM Library Compromised, Updates Advised Against

The PyPI release 1.82.8 of the LiteLLM library has been compromised. The malicious update includes a file named litellm_init.pth, which contains Base64 encoded instructions designed to exfiltrate credentials to a remote server and self-replicate. Security concerns highlight the severe risk this poses to development environments, potentially allowing attackers to compromise local configuration files and agents far beyond traditional identity theft methods. Users are advised not to update to this version.

7.9

#5 Modular Announces Open Source Release of Models and All GPU Kernels

Modular has announced that it is open-sourcing not only its AI models but also its entire suite of GPU kernels. The company aims to make these kernels compatible with multi-vendor consumer hardware while fostering an open ecosystem for further innovation. Modular expresses high confidence in its approach, asserting that even if competitors match their current work, the underlying advantages of their Mojo programming language will maintain their competitive edge.

7.9

#6 FastMCP Framework Released

FastMCP is a framework designed to streamline the development of Model Context Protocol (MCP) servers. It provides developers with accessible tools and interfaces to build server-side applications that interact with AI models more efficiently.

7.6

#7 Building Safety into Sora 2 and the Sora App

To address novel safety challenges associated with advanced video generation and a new social creation platform, OpenAI has integrated concrete safety protections into the foundational design of Sora 2 and the Sora app.

6.9

#8 Best Practices for Building Tools with Claude Agent SDK

When building read-only tools using the Claude Agent SDK, developers should set the 'readOnlyHint: true' flag. This informs Claude Code that the tool has no side effects and is safe to execute in parallel, preventing it from acting as a 'serializing barrier' that would block other tools from running simultaneously.

6.9

#9 Google IO Invites Submissions for AI Studio Projects

Google is working on a special project for Google IO and is inviting developers to submit their AI Studio applications. Participants are asked to share their projects along with a one-sentence story explaining how and why they chose to build them using Vibe Coding.

6.7

#10

#10 Sam Altman Steps Down from Helion's Board of Directors

Sam Altman has stepped down from the Board of Directors at Helion. The decision aims to avoid potential conflicts of interest as Helion and OpenAI begin to explore large-scale collaboration, enabling both companies to work together more effectively to advance the delivery of safe, zero-carbon energy.

6.5

#11

#11 Gstack Tests Accidentally Fixed Codebase Bugs

A developer running gstack end-to-end tests encountered an improperly sandboxed environment, which resulted in the test suite accidentally fixing bugs within the codebase. Claude Code, which was monitoring the process, identified the changes and confirmed them as effective fixes. This incident highlights an unexpected intersection between automated testing and AI-driven development.

6.5

#12

#12 Exploring the Potential of Replit in Development

A user highlights their positive experience with Replit Agent 3, describing it as a tool that enables "vibe-coding" by allowing developers to build projects without obsessing over code. The author notes that the platform is not only an efficient tool for rapid development but also serves as an excellent resource for teaching web development concepts to children.

6.3