Skip to main content
History
About
中文

2026-05-16 Digest

Tracked 269 · Curated 15

#1 Codex Rises, Claude Meters Programmatic Usage Amidst Market Shifts

The AI landscape sees a divergence post-GPT 5.5: Anthropic's Claude gains traction with its growth and CFO, while AI engineers show increasing preference for Codex. Anthropic's shift to meter programmatic usage by linking API credits to subscription plans has drawn criticism, perceived by some as a 'rug pull'. This coincides with OpenAI's enterprise promotions and Codex adopting a more liberal approach. Concurrently, advancements in agent infrastructure and UX are noted, with Cline, LangChain, Notion, and Cursor releasing features focused on long-term state, streaming, and orchestration.

10.5

#2 SU-01 Achieves Gold-Medal Olympiad Reasoning with Compact 30B-A3B Model

The SU-01 model, using a unified recipe of reverse-perplexity curriculum SFT and two-stage RL, has achieved gold-medal olympiad reasoning with a compact 30B-A3B model. It solves IMO 2025 and USAMO 2026 problems with 35 points each, sustaining 100K+ token reasoning traces without external tools.

9.5

#3 ServiceNow Releases EVA-Bench: An End-to-End Voice Agent Evaluator

ServiceNow has released EVA-Bench, an end-to-end evaluator for voice agents. EVA-Bench simulates bot-to-bot audio conversations to score task accuracy (EVA-A) and conversational experience (EVA-X) across 213 scenarios in airline, HR, and IT domains.

8.7

#4 Eric Jang Explains Building AlphaGo From Scratch Using Modern AI Tools

Eric Jang details how to build AlphaGo from scratch using modern AI tools, offering insights into the primitives of intelligence. He explains AlphaGo's core components like search, learning from experience, and self-play, contrasting its Monte Carlo Tree Search (MCTS) approach with reinforcement learning (RL) in LLMs, highlighting MCTS's advantage in sidestepping the credit assignment problem. Jang also discusses his 'Autoresearch' loop and the current capabilities and limitations of LLMs in automating AI research, touching upon the potential for an intelligence explosion.

7.6

#5 Mastering Agent Management and Fundamentals Makes You Unstoppable

Achieving mastery in agent management, coupled with a deep understanding of fundamentals, makes one unstoppable. People naturally gravitate towards experts. The amplification agents provide to one's output is a critical advantage that should not be overlooked.

7.6

#6 AI Developer Explores 'Sub-Agent' Concept for Claude Code

Developer SVPino shares his embrace of the 'sub-agent' concept within Claude Code, stating, 'Everything that can be a subagent should be a subagent.' While admitting he needs more time and experience to properly calibrate what qualifies as a sub-agent, he is fast-tracking this learning process through active use. He highlights that each sub-agent having its own context window offers advantages, particularly when running multiple agents.

7.3

#7 Raycast Beta V2 Update: Integrates Launcher and AI Agent Capabilities

Raycast has released its Beta V2 version, transforming from a mere launcher into a tool combining launcher and AI Agent capabilities. The UI and interface have been completely redesigned to better align with current Mac system design principles. The update includes a fundamental infrastructure refactor, covering the launcher's core, search, scheduling, extension functionalities, and settings, alongside an upgraded search feature that can invoke Skills.

7.1

#8 Why Block handed Goose to the Linux Foundation

Block has open-sourced its internal AI coding agent, Goose, and transferred it to the Linux Foundation to address enterprise adoption challenges stemming from trademark ownership and a lack of transparent governance. Goose, along with MCP and Agents.MD, forms the core of the newly founded Agentic AI Foundation (AAIF), which operates under the Linux Foundation.

6.8

#9 Show HN: Find the best local LLM for your hardware, ranked by benchmarks

A new tool called benchLLM helps users select the best-performing local Large Language Model (LLM) for their specific hardware. The tool ranks models based on benchmarks and provides a GitHub repository for further details (https://github.com/Andyyyy64/whichllm).

6.5

#10 Forward Deployed Engineer: A New AI Era Role

Google is increasing its investment in the Forward Deployed Engineer (FDE) role and streamlining the hiring process. FDEs are becoming a focal point in the talent race within the AI field.

6.4

#11 Claude helps user recover 5 BTC lost 11 years ago

A Bitcoin user, cprkrn, posted that they recovered 5 BTC lost 11 years ago due to forgetting a password after drug use, with the help of AI Claude. The recovered coins are worth approximately $400,000 USD at current prices. The user expressed immense gratitude.

6.4

#12 QueryData for AlloyDB: Query Complex Databases with Natural Language

This codelab demonstrates how to use QueryData for AlloyDB to query complex databases using natural language, powered by high-speed vector search. This democratizes data access, going beyond simple SELECT statements.

6.3

#13 Deployed Demo of the <Water> Component for Shaders in React

Developer shuding shared a concept they've been working on: bringing shaders to React. They've released the first deployed demo of the <Water> component, available at https://t.co/oyygjLlIeQ.

6.2

#14 Markdown Criticized for Low Information Density, HTML Deemed Superior

Markdown is criticized for its low information density, with the author stating it was 'doomed from the start.' The article argues HTML is superior for both humans and AI, but due to typing difficulties, an open-source tool has emerged to generate HTML. Links to the tool and its repository are provided.

6.2

#15 Helfie Uses Azure and NVIDIA AI to Improve Healthcare in Remote Australia

In remote Australia, accessing a doctor can mean traveling hundreds of miles. The Catalyst series features Helfie, which leverages Microsoft Azure and NVIDIA to bridge this gap with AI-driven health monitoring, improving healthcare access.

6.1

Type keywords to search