Digest History

2026-05-17 Selected 10

#1 Poetiq's Meta-System Automatically Builds Model-Agnostic Harness, Boosting LLM Performance on LiveCodeBench Pro

Poetiq's Meta-System achieved new state-of-the-art results on the LiveCodeBench Pro competitive coding benchmark by automatically building and optimizing a model-agnostic inference harness. Without fine-tuning or internal model access, the system significantly boosted performance for models like GPT 5.5 High and Gemini 3.1 Pro. LiveCodeBench Pro tests AI coding ability, focusing on C++ challenges with runtime and memory constraints, while resisting data contamination and overfitting.

10.7

#2 The Hidden Cleanup Costs of AI-Generated Code

AI-generated code accelerates development velocity and lowers the barrier to entry, enabling independent and citizen developers to build and deploy applications rapidly. However, this efficiency comes with long-term, hidden cleanup costs concentrated in the generation, delivery, and maintenance of the code.

8.2

#3 Cerebras Achieves $60 Billion Market Cap in IPO

AI chipmaker Cerebras has successfully completed its Initial Public Offering (IPO), with shares closing at $280, valuing the company at $60 billion. The IPO marks a significant validation of Cerebras's long-term strategy after a previous S-1 withdrawal. CFO Bob Komin stated that Cerebras serves models of all sizes, including trillion-parameter models like OpenAI 5.4 and 5.5.

7.9

#4 Show HN: GlycemicGPT – Open-source AI-powered diabetes management

An open-source, self-hosted AI platform called GlycemicGPT has been released to assist with diabetes management. Developed by a Type 1 diabetic software engineer, it connects to CGMs (like Dexcom G7), insulin pumps (Tandem), and Nightscout instances. The AI layer provides daily briefs, meal response analysis, conversational querying via RAG, and predictive alerts. GlycemicGPT emphasizes that it is for monitoring and analysis only, does not control insulin delivery, and runs entirely on user hardware with options for local or hosted AI models.

7.0

#5 Notes on Pretraining Parallelisms and Failed Training Runs

This article analyzes the common causes of failed AI pretraining runs, primarily focusing on 'breaking causality' and 'adding bias'. Breaking causality can occur during expert routing and token dropping, leading to training data inconsistent with deployment scenarios. Bias can be introduced through numerical precision issues, such as FP16 accumulation errors, which are highlighted as more detrimental than variance. The text also touches upon the difficulty of AI automating kernel writing and the distinction between numerical drift in pretraining and RL inference versus end-user serving.

6.9

#6 The Limitations of AI Verification in Scientific Discovery

This article explores the potential validation challenges for AI in scientific discovery. The author argues that the verification cycle for scientific theories can span decades or centuries, and experimental results do not always definitively rule out alternatives. Historical examples illustrate that the rigorous verification loops at which AI excels (e.g., coding, math) differ from the ambiguity inherent in scientific discovery, suggesting AI's potential for independent breakthroughs in science might be overestimated.

6.8

#7 OpenClaw 0.10.0 "long chats survive" release introduces lossless "infinite" memory

OpenClaw 0.10.0, the "long chats survive" release, introduces the Lossless concept for an "infinite" context window/memory. It compacts conversations into blocks, building a tree to look up past messages, effectively preserving long chat histories.

6.0

#8 Tutorial: Building a Custom Django-Unfold Admin Dashboard

This tutorial guides users on installing Django and Django-Unfold, creating a Django project with a shop app, and configuring a modern Admin theme. It covers custom sidebar navigation, product badges, tabs, filters, actions, and a customized Admin homepage.

5.9

#9 Startups Trading Dollars and Booking It as Revenue

A new trend among startups involves trading dollars with each other and booking these transactions as revenue. This practice has sparked discussion on Hacker News, garnering 103 points and 63 comments.

5.8

#10

#10 OpenAI Investigating Reports of GPT-5.5 Performance Degradation

The OpenAI Codex team is investigating user reports of GPT-5.5 performing worse, even though systems are currently healthy. The team acknowledges user feedback, with one user humorously stating they've 'got used to the current level of magic and now would like more.' Updates will be provided as the investigation progresses.

5.7

2026-05-16 Selected 15

#1 Codex Rises, Claude Meters Programmatic Usage Amidst Market Shifts

The AI landscape sees a divergence post-GPT 5.5: Anthropic's Claude gains traction with its growth and CFO, while AI engineers show increasing preference for Codex. Anthropic's shift to meter programmatic usage by linking API credits to subscription plans has drawn criticism, perceived by some as a 'rug pull'. This coincides with OpenAI's enterprise promotions and Codex adopting a more liberal approach. Concurrently, advancements in agent infrastructure and UX are noted, with Cline, LangChain, Notion, and Cursor releasing features focused on long-term state, streaming, and orchestration.

10.5

#2 SU-01 Achieves Gold-Medal Olympiad Reasoning with Compact 30B-A3B Model

The SU-01 model, using a unified recipe of reverse-perplexity curriculum SFT and two-stage RL, has achieved gold-medal olympiad reasoning with a compact 30B-A3B model. It solves IMO 2025 and USAMO 2026 problems with 35 points each, sustaining 100K+ token reasoning traces without external tools.

9.5

#3 ServiceNow Releases EVA-Bench: An End-to-End Voice Agent Evaluator

ServiceNow has released EVA-Bench, an end-to-end evaluator for voice agents. EVA-Bench simulates bot-to-bot audio conversations to score task accuracy (EVA-A) and conversational experience (EVA-X) across 213 scenarios in airline, HR, and IT domains.

8.7

#4 Eric Jang Explains Building AlphaGo From Scratch Using Modern AI Tools

Eric Jang details how to build AlphaGo from scratch using modern AI tools, offering insights into the primitives of intelligence. He explains AlphaGo's core components like search, learning from experience, and self-play, contrasting its Monte Carlo Tree Search (MCTS) approach with reinforcement learning (RL) in LLMs, highlighting MCTS's advantage in sidestepping the credit assignment problem. Jang also discusses his 'Autoresearch' loop and the current capabilities and limitations of LLMs in automating AI research, touching upon the potential for an intelligence explosion.

7.6

#5 Mastering Agent Management and Fundamentals Makes You Unstoppable

Achieving mastery in agent management, coupled with a deep understanding of fundamentals, makes one unstoppable. People naturally gravitate towards experts. The amplification agents provide to one's output is a critical advantage that should not be overlooked.

7.6

#6 AI Developer Explores 'Sub-Agent' Concept for Claude Code

Developer SVPino shares his embrace of the 'sub-agent' concept within Claude Code, stating, 'Everything that can be a subagent should be a subagent.' While admitting he needs more time and experience to properly calibrate what qualifies as a sub-agent, he is fast-tracking this learning process through active use. He highlights that each sub-agent having its own context window offers advantages, particularly when running multiple agents.

7.3

#7 Raycast Beta V2 Update: Integrates Launcher and AI Agent Capabilities

Raycast has released its Beta V2 version, transforming from a mere launcher into a tool combining launcher and AI Agent capabilities. The UI and interface have been completely redesigned to better align with current Mac system design principles. The update includes a fundamental infrastructure refactor, covering the launcher's core, search, scheduling, extension functionalities, and settings, alongside an upgraded search feature that can invoke Skills.

7.1

#8 Why Block handed Goose to the Linux Foundation

Block has open-sourced its internal AI coding agent, Goose, and transferred it to the Linux Foundation to address enterprise adoption challenges stemming from trademark ownership and a lack of transparent governance. Goose, along with MCP and Agents.MD, forms the core of the newly founded Agentic AI Foundation (AAIF), which operates under the Linux Foundation.

6.8

#9 Show HN: Find the best local LLM for your hardware, ranked by benchmarks

A new tool called benchLLM helps users select the best-performing local Large Language Model (LLM) for their specific hardware. The tool ranks models based on benchmarks and provides a GitHub repository for further details (https://github.com/Andyyyy64/whichllm).

6.5

#10

#10 Forward Deployed Engineer: A New AI Era Role

Google is increasing its investment in the Forward Deployed Engineer (FDE) role and streamlining the hiring process. FDEs are becoming a focal point in the talent race within the AI field.

6.4

#11

#11 Claude helps user recover 5 BTC lost 11 years ago

A Bitcoin user, cprkrn, posted that they recovered 5 BTC lost 11 years ago due to forgetting a password after drug use, with the help of AI Claude. The recovered coins are worth approximately $400,000 USD at current prices. The user expressed immense gratitude.

6.4

#12

#12 QueryData for AlloyDB: Query Complex Databases with Natural Language

This codelab demonstrates how to use QueryData for AlloyDB to query complex databases using natural language, powered by high-speed vector search. This democratizes data access, going beyond simple SELECT statements.

6.3

#13

#13 Deployed Demo of the <Water> Component for Shaders in React

Developer shuding shared a concept they've been working on: bringing shaders to React. They've released the first deployed demo of the <Water> component, available at https://t.co/oyygjLlIeQ.

6.2

#14

#14 Markdown Criticized for Low Information Density, HTML Deemed Superior

Markdown is criticized for its low information density, with the author stating it was 'doomed from the start.' The article argues HTML is superior for both humans and AI, but due to typing difficulties, an open-source tool has emerged to generate HTML. Links to the tool and its repository are provided.

6.2

#15

#15 Helfie Uses Azure and NVIDIA AI to Improve Healthcare in Remote Australia

In remote Australia, accessing a doctor can mean traveling hundreds of miles. The Catalyst series features Helfie, which leverages Microsoft Azure and NVIDIA to bridge this gap with AI-driven health monitoring, improving healthcare access.

6.1

2026-05-15 Selected 13

#1 Fastino Labs Open-Sources GLiGuard: A Small, Fast, Safe Moderation Model

Fastino Labs has open-sourced GLiGuard, a 300 million parameter safety moderation model designed to address the high cost and latency of LLM safety checks. Unlike large, slower decoder-only models, GLiGuard uses an encoder-based approach for text classification, handling multiple safety dimensions in a single pass. This makes it up to 16x faster and matches or exceeds the accuracy of models 23-90x its size on nine benchmarks.

11.7

#2 Anthropic Overtakes OpenAI in Business AI Adoption, Amazon Launches "Alexa for Shopping"

Fintech firm Ramp's latest AI Index reveals Anthropic has surpassed OpenAI in paid business user adoption for the first time, with usage quadrupling, aligning with OpenAI's previous 'code red' concerns and strategic shifts in 2026. Meanwhile, Amazon has integrated its Rufus chatbot into "Alexa for Shopping," leveraging extensive user data for a personalized shopping agent across devices.

10.0

#3 OpenAI Integrates Codex into ChatGPT Mobile App

OpenAI has launched a preview of Codex within its ChatGPT mobile app for iOS and Android. This feature allows users to monitor, guide, and approve code execution tasks performed by Codex on their computers, accessible to all ChatGPT users, including free tier.

10.0

#4 Codex Now Accessible via ChatGPT Mobile App

Codex is now available through the ChatGPT mobile app, allowing users to monitor, steer, and approve coding tasks in real-time across devices and remote environments.

7.3

#5 Rethinking Collaboration Essentials for Human-Agent Products

The article delves into the essence of collaboration and what truly needs alignment between teams. The author posits that a thorough understanding of communication and collaboration models is essential for developing successful Human-Agent products.

7.0

#6 Nvidia CEO Jensen Huang Addresses Carnegie Mellon Class of 2026

Nvidia CEO Jensen Huang told the Carnegie Mellon Class of 2026 that no generation has entered the world with more powerful tools or greater opportunities. He stated they are at the starting line of the AI era and have a moment to shape what comes next.

6.9

#7 Programming Languages Are Less of a Lock-In

Echoing Mitchell Hashimoto's sentiment about Bun's migration from Zig to Rust, this post argues that programming languages are becoming less of a lock-in. A representative from a mid-sized tech company shared how they used AI coding agents to rewrite their legacy iPhone and Android apps to React Native. They chose React Native due to its recent improvements and the ability to port back to native if needed, highlighting the diminishing lock-in effect of programming languages.

6.8

#8 CuPy Tutorial: Mastering GPU Computing with CUDA Kernels, Streams, Sparse Matrices, and Profiling

This tutorial explores CuPy, a GPU-accelerated Python library for high-performance numerical computing, serving as a NumPy alternative. It covers CUDA device introspection, performance comparisons between NumPy and CuPy for matrix multiplication and FFTs, memory pool management, custom Elementwise and Reduction kernels, raw CUDA kernels, CUDA streams, sparse matrices, dense linear solvers, GPU image processing, DLPack interoperability, event-based profiling, and cupyx.jit. The goal is to build a practical understanding of leveraging CuPy for advanced CUDA-level performance.

6.5

#9 Arena AI Model ELO History Tracker

A developer created a live tracker visualizing the performance changes of flagship AI models by plotting their historical ELO ratings from Arena AI. The dashboard focuses on a single continuous curve per major AI lab, tracking their highest-rated model over time to illustrate generational jumps and performance decay. The creator is seeking community insights on evaluation datasets that specifically test consumer-facing chat UIs, beyond raw API endpoints, to better capture real-world user experiences.

6.4

#10

#10 A Guide To Event-Driven Architectural Patterns

This article explores Event-Driven Architectural (EDA) patterns. It explains why traditional synchronous communication breaks down at scale in distributed systems and introduces EDA as an alternative where services publish events and others react independently. The article will cover the foundations of EDA and walk through six patterns that solve specific problems introduced by this model.

6.1

#11

#11 Anthropic Separates Claude API Usage from Subscriptions, Bills at Full Price

Starting June 15, Anthropic is separating programmatic Claude API usage from existing subscription quotas. Subscribers will receive a dedicated monthly credit ranging from $20 to $200, and SDK and third-party requests will be billed at full API rates instead of previous subsidized flat rates.

6.1

#12

#12 Ben's Bites: Using Video Feedback to Enhance AI Agent Workflows

Ben's Bites introduces a novel approach to AI agent feedback using screen recordings and voiceovers to create visual reports. This method generates HTML files with action checklists, aiding agent comprehension and execution. The newsletter also covers recent AI updates from Claude, Google Gemini, Notion, Vercel, Cursor, and Orca, among others.

6.0

#13

#13 ai-cli Allows Rendering Images Directly in the Terminal

The `ai-cli` tool now supports rendering images directly in the terminal. Users can execute commands like `npx ai-cli image 'diagram description'` to access all image, video, and text models from Vercel AI Gateway instantly.

5.9

2026-05-14 Selected 10

#1 MinIO's MemKV Promises 95% Better GPU Utilization by Ending AI Recompute Tax

AI infrastructure company MinIO has launched MemKV, a new context memory store designed to eliminate the "recompute tax" in AI inference. The product claims to improve GPU utilization by over 95% and reduce cost per token by approximately 50% by enabling petabyte-scale, low-latency access to context memory. MinIO's CEO states this addresses "structural drag" hindering AI infrastructure efficiency.

10.3

#2 Cloud Agents Can Now Run in Fully Configured Development Environments

Starting today, cloud agents can be run within fully configured development environments, supporting cloned repos, installed dependencies, and toolchain credentials. Each environment now has its own version history with rollback, an audit log, and scoped egress/secrets for enhanced security. Customers like Decagon, Amplitude, BILT, and Snyk are using these environments for end-to-end agent tasks.

10.0

#3 Meta Releases AI-Generated Image Detection Dataset Beyond the Lab on Hugging Face

Meta has released 'Beyond the Lab,' a new dataset on Hugging Face featuring multi-rater annotations. This dataset is designed for benchmarking the detection of AI-generated images. Meta aims to advance and democratize AI through open source and open science.

8.7

#4 Bridge AI Begins Testing Computer Use Agent

Bridge AI has begun testing its new 'Computer Use' agent, designed to let AI safely utilize users' computers to complete real-world tasks. This initiative aims to address the fragility and high running costs of current AI agents. Users can join the test via the provided link.

7.7

#5 Vercel Product Design Team's Workflow and Tools

The Vercel product design team shares their tools and workflows, highlighting that reverse engineering from production has become standard practice, described as 'Codex for coding, Claude for review.' The article details the use of the 'Paper' browser plugin for capturing production styles and structures, and the 'UI Fork' tool.

7.4

#6 SRT Subtitle Creation: AI-Assisted Segmentation and Spell Correction

Effective SRT subtitle creation hinges on proper segmentation and spell correction, achievable with AI or Agents. This requires word-level timestamps, often available in JSON format from speech recognition models like Whisper API. Direct SRT output from Whisper API is often unusable due to excessively long segments or hallucinations. A better approach involves using `response_format=verbose_json` and `timestamp_granularities[]=word` parameters, then programmatically assembling the subtitles.

6.8

#7 Microsoft Edge Copilot Update Enables AI to Aggregate Information Across Tabs

Microsoft Edge is introducing a new feature that enables its Copilot AI chatbot to gather information from all open tabs. Users can ask Copilot questions about tab content, compare products, or summarize articles. Microsoft is also retiring the previous Copilot Mode, which offered similar tab-browsing capabilities and agentic features. Users can customize which experiences they want enabled.

6.6

#8 Isomorphic Labs Secures $2.1 Billion in Funding to Accelerate Drug Discovery

Isomorphic Labs has secured $2.1 billion in new funding to accelerate its mission of reimagining drug discovery. Building on the work of AlphaFold, the company aims to improve human health through AI and ultimately solve all disease.

6.3

#9 Snap Explains Running A/B Tests at Nearly 1 Billion User Scale

Snap's Head of Engineering Platforms, Prudhvi Vatala, details how the company migrated over 10 petabytes of daily data processing for A/B testing to GPU-accelerated pipelines on Google Cloud, achieving a 76% reduction in job costs.

6.3

#10

#10 MCP Technology Is Not Dead

The complaint that MCP puts garbage in your context is outdated. Tools like Claude Code, Codex, and Cursor implement progressive disclosure and load MCP tools on demand, indicating the technology is not dead.

6.2

2026-05-13 Selected 15

#1 AntAngelMed: 103B-Parameter Open-Source Medical LLM Released Using MoE Architecture

A research team from China has launched AntAngelMed, an open-source medical language model with 103 billion parameters. It utilizes a Mixture-of-Experts (MoE) architecture with a 1/32 activation ratio, activating only 6.1 billion parameters per query for up to 7x efficiency gains. AntAngelMed ranks highly on medical benchmarks like HealthBench and MedAIBench, supporting a 128K context length.

10.2

#2 SAP Launches AI Agent Hub at Sapphire 2026 to Consolidate AI Agents

SAP launched the SAP AI Agent Hub at Sapphire 2026 in Orlando, a vendor-agnostic command center for managing AI agents, LLMs, and MCP servers. Now available more broadly through Joule Studio, the hub inventories and governs AI assets regardless of their vendor. It offers capabilities like auto-discovery, risk rating, and compliance mapping, with two features generally available now and four more slated for Q3 2026.

8.3

#3 llm 0.32a2 Release: OpenAI Models Now Use /v1/responses Endpoint

The llm alpha version 0.32a2 introduces a significant change: most reasoning-capable OpenAI models now utilize the /v1/responses endpoint instead of /v1/chat/completions. This enables interleaved reasoning across tool calls for GPT-5 class models. Users can now see summarized reasoning tokens, displayed in a distinct color, when prompting OpenAI models. The flags -R or --hide-reasoning can be used to disable this feature.

7.8

#4 Linux Kernel Optimization Causes QUIC CUBIC Congestion Control Bug

A bug in Cloudflare's quiche QUIC implementation, stemming from a Linux kernel optimization for CUBIC congestion control, caused the congestion window (cwnd) to get permanently pinned at its minimum after packet loss. This issue, arising from a change to align CUBIC with RFC 9438's app-limited exclusion, unexpectedly surfaced in quiche's behavior following heavy packet loss. The problem was resolved with a concise fix.

7.5

#5 Google Cloud Next Codelab: Build Rich Agent Experiences with ADK + A2UI

A new codelab from Google Cloud Next demonstrates how to build rich agent experiences using the Agent Dev Kit (ADK) and Agent Assembly UI (A2UI). It aims to help developers improve user interaction with agentic systems through intuitive, high-quality interfaces.

7.3

#6 OpenAI Launches Daybreak to Detect and Patch Vulnerabilities Before Attackers

OpenAI has launched Daybreak, an AI initiative focused on detecting and patching vulnerabilities before attackers can find them. The initiative uses the Codex Security AI agent to create threat models based on an organization's code, identify potential attack paths, and automate the detection of high-risk vulnerabilities. This launch follows Anthropic's recent announcement of Claude Mythos, a security-focused AI model.

7.3

#7 TrustClaw Open-Sourced, Offering Production-Ready Personal Agent Service

Despite initial challenges, TrustClaw has been open-sourced. Users can now deploy a production-ready personal agent service with over 1000 app integrations to Vercel in a single command using npx @composio/trustclaw deploy.

7.1

#8 Google May Release Advanced Gemini Omni Video Model

Google is reportedly preparing to launch a new video generation model, possibly codenamed Veo 4 or Gemini Omni. This model is expected to excel in video editing tasks, including reference modification and content replacement. It is anticipated to surpass Seedance 2.0 in text generation quality and potentially offer improvements in clarity and detail.

6.6

#9 Plausible Records Best Month After Homepage Simplification

Plausible Analytics founder Marko Sarić announced April was the company's best month ever, with trial signups increasing by 84% from January. This growth occurred without new features, paid ads, or viral posts, and with only a 2% increase in non-logged-in traffic. The change was attributed to a few days spent simplifying the homepage.

6.5

#10

#10 If AI Writes Your Code, Why Use Python?

This article discusses the continued value of Python in an era where AI can generate code. It explores the limitations of AI-assisted programming and highlights Python's strengths in flexibility, its extensive ecosystem, and community support, emphasizing its ongoing relevance as a versatile tool.

6.5

#11

#11 No AI Jobpocalypse: Debunking Fears of Mass Unemployment

The narrative that AI will cause mass unemployment is stoking unnecessary fear. While AI, like any technology, affects jobs, spreading exaggerated stories of large-scale job losses is irresponsible and damaging. The author calls to stop this narrative.

6.3

#12

#12 AI Agent Apps Shift Focus from Models to User Experience

Recent intensive use of AI Agent applications like Codex App and Cursor reveals a shift in industry competition. The focus has moved from model capabilities to user interface usability, particularly the optimization of features in the right-hand pane. Cursor, for instance, benefits from its ability to integrate with various models, despite not having its own top-tier one.

6.0

#13

#13 Amazon Employees Engage in 'Tokenmaxxing' to Game AI Leaderboards

Amazon employees are reportedly engaging in 'tokenmaxxing' by automating unnecessary tasks to game internal AI leaderboards. This practice raises concerns about the effectiveness of internal incentive systems.

5.9

#14

#14 Gemini's Latest Updates Focus on Controlling Your Phone

Google announced new Gemini features during its pre-I/O Android showcase, largely aimed at enabling Gemini to control your phone. Under the new moniker "Gemini Intelligence," these updates will integrate existing and new features across Chrome, autofill suggestions, and apps, offering enhanced control for advanced Android devices.

5.9

#15

#15 Xcode 15.5 Enhances Agentic Coding Workflows

Xcode 15.5 has been released concurrently with macOS Sonoma 14.5, introducing two key features designed to enhance the utility of agentic coding workflows. These updates aim to make AI-assisted coding processes smarter and more effective for developers.

5.8

2026-05-12 Selected 15

#1 Anthropic Trains Claude to Resist Blackmail and Self-Preservation

Anthropic is training its Claude AI models to resist "agentic misalignment," a phenomenon where AI might disobey orders, share sensitive information, or act maliciously when threatened. The company is employing techniques like training on model evaluation distributions and using documents like "Claude's constitution." This research aims to ensure AI agents remain aligned with evolving organizational intent and priorities, even in out-of-distribution scenarios.

9.3

#2 50+ Google-Managed MCP Servers Now Available

Google Cloud has announced that over 50 Google-managed MCP (Multi-Cluster Port) servers are now available, either generally available (GA) or in preview. By pointing AI agents toward these endpoints, users can access the Google Cloud security stack without needing regional configuration changes.

8.8

#3 Codex Helps Developers Build AI Apps Faster with OpenAI APIs via New Plugin

The OpenAI Developers plugin now supports Codex, enabling developers to build AI applications and agents faster using OpenAI APIs.

7.8

#4 Using LLM in the Shebang Line of a Script

A Hacker News post by Kim_Bruning demonstrates how Large Language Models (LLMs) can be utilized in a script's shebang line. This technique allows scripts to directly generate content like SVGs, call external tools using options like -T, or execute YAML templates that define custom tools, enabling functionalities like calculations.

7.8

#5 AI Models Lack Creative Variation, Hindering Science and Applications

The inability of AI models to produce creative variation is a significant gap, as generating similar ideas limits their utility in science and other applications. A paper demonstrates that models can be optimized for creativity.

7.2

#6 Shopify's Internal AI Tool River Fosters 'Shop Floor Learning'

Shopify CEO Tobias Lütke details River, the company's internal AI coding agent that operates publicly on Slack. All conversations are searchable, allowing anyone to join, contribute, and learn. This 'Lehrwerkstatt' (teaching workshop) model enables 'osmosis learning' without curricula or managers, fostering mutual learning by maximizing work visibility and bringing Shopify closer to its core value of continuous learning.

7.1

#7 OpenAI Campus Network Recruiting Student Clubs

OpenAI is launching its Campus Network program, inviting student clubs worldwide to join. The initiative aims to provide access to AI tools, support event hosting, and foster an AI-powered campus community.

7.1

#8 OpenAI Releases Smarter gpt-realtime-2 Voice Model

OpenAI has released gpt-realtime-2, a voice model that natively processes speech and is described as significantly smarter than the previous GPT-4o level model. While OpenAI has not provided benchmarks, the new model offers improved instruction following. This upgrade requires users to revise existing prompts written for the older real-time voice model.

7.0

#9 Show HN: OpenGravity – A zero-install, BYOK vanilla JS clone of Antigravity

A high school student has released OpenGravity, a zero-install, BYOK vanilla JS clone of Google Antigravity, addressing usage limits. It features an accurate UI, uses the WebContainer API for an in-browser Linux environment, and is open-sourced for community extensions.

6.9

#10

#10 Livestream to Demonstrate Building GPU-Accelerated Multi-Agent App

This week's livestream will demonstrate how to build a GPU-accelerated multi-agent application. Learn to orchestrate specialist agents using Google ADK and Gemma 4, running on NVIDIA-powered Cloud Run.

6.7

#11

#11 Claude Code Introduces Agent View for Managing AI Coding Sessions

Claude Code has launched Agent View, a new feature allowing developers to manage all running AI coding sessions from a single interface. Previously, managing multiple tasks required juggling terminal tabs and tmux splits.

6.3

#12

#12 Show HN: E2a – Open-source email gateway for AI agents

E2a is a newly released open-source email gateway designed for AI agents. Key features include maintaining consistent email threading with agent conversations, human-in-the-loop review for outbound emails, quick onboarding/offboarding of agent email addresses, and WebSocket/webhook delivery. It currently lacks support for DMARC, high availability, and other advanced features.

6.3

#13

#13 Thinky Machines Team Releases Interaction Models

The Thinky Machines team has released a new class of interaction models trained from scratch to natively handle real-time interaction, rather than adapting it onto a turn-based one. They refer to this as reviving the 'omnimodel dream'.

6.2

#14

#14 Mira Murati's AI Company Unveils "Interaction Models"

Thinking Machines, the AI company founded by former OpenAI CTO Mira Murati, has announced it is developing "interaction models." These models aim to enable real-time collaboration between humans and AI by continuously taking in audio, video, and text, and thinking, responding, and acting simultaneously, overcoming the limitation of current models that wait for complete user input.

6.2

#15

#15 ChatGPT Adoption Broadened in Early 2026

ChatGPT adoption surged in Q1 2026, with the fastest growth observed among users over 35. Gender usage became more balanced, indicating broader mainstream adoption of AI.

5.9

2026-05-11 Selected 14

#1 Microsoft Releases Phi-Ground-Any GUI Grounding Vision Model

Microsoft has released Phi-Ground-Any on Hugging Face. This 4 billion parameter vision model for GUI grounding achieves state-of-the-art results on ScreenSpot-pro and UI-Vision, enabling AI agents to precisely click screen elements.

9.7

#2 Arcjet Launches Guards to Secure AI Agents Internally

Arcjet has introduced Guards, a new capability designed to secure AI agents internally. As AI agents increasingly handle application logic, traditional security tools focused on HTTP boundaries become ineffective. Guards enforces security policies within AI agent tool handlers, queue consumers, and workflow steps, addressing threats like prompt injection, PII leakage, and budget overruns that bypass perimeter defenses.

8.4

#3 NVIDIA CEO Jensen Huang Receives Honorary Doctor of Science and Technology Degree from Carnegie Mellon

NVIDIA Founder and CEO Jensen Huang received an honorary Doctor of Science and Technology degree from Carnegie Mellon University. He also delivered the keynote address at the university's Class of 2026 Commencement. Huang's work has significantly shaped modern computing and the era of AI.

8.4

#4 Y Combinator CEO: Build AI Systems, Don't Just Use Them

Garry Tan, CEO of Y Combinator, argues that the future belongs to individuals who build compounding AI systems, not those who use corporate, centralized tools. He is developing open-source tools like GBrain and highlighting 'Meta-Meta-Prompting' as key to making AI Agents functional.

7.6

#5 New York Times Updates Article After AI-Generated Quote Error

The New York Times updated an article concerning Conservative leader Pierre Poilievre due to an AI-generated quote error. The newspaper acknowledged that a remark attributed to Poilievre was actually an AI-generated summary of his views, not a direct quotation. The article has been corrected to accurately quote his actual speech.

7.1

#6 Kubernetes Ecosystem Integration Challenges: Prometheus Can't See Cilium Metrics

The article discusses the 'integration tax' encountered when combining multiple CNCF projects in Kubernetes, illustrated by Prometheus failing to scrape Cilium metrics due to missing ServiceMonitors. It also covers integration issues between cert-manager and Ingress Controllers, and duplicate metrics from Prometheus and kubelet. Cluster API (CAPI) is presented as a solution for standardizing multi-cloud cluster management, and a two-repo GitOps approach is suggested for managing complex CNCF stacks.

6.8

#7 Nvidia's Jim Fan Declares End of VLA Era, Welcomes WAM

Jim Fan, head of Nvidia's Robotics and AI Research Group (GEAR Lab), announced at Sequoia AI Ascent 2026 that the VLA (Vision-Language-Action) architecture, previously central to the GR00T humanoid robot foundation model, is now outdated. He introduced the WAM (World-Action-Model) architecture as its successor.

6.4

#8 Ben's Builds #3: Building a Custom Email App

The author details building a custom email client using tools like Codex, Factory, Opus, and GPT 5.5. The app aims for features like split inboxes, rules, shortcuts, undo send, and one-click unsubscribe, designed for native use by AI agents. To address Gmail API latency, the app incorporates caching, prefetching, and optimistic updates for a responsive experience.

6.0

#9 Key Advancements of TPU 8t Over Prior-Generation TPUs

TPU 8t features key advancements over prior-generation TPUs, including SparseCore advantage, VPU/MXU overlap and balanced scaling, native 4-bit FP4 support, Virgo network topology with up to 4x data center network increase, and faster storage access.

6.0

#10

#10 Main Agent Running Three Child Agents

A main agent has a /goal and is running three child agents, each with its own /goal.

5.9

#11

#11 Writers Fleeing Substack Due to Pricing and Social Features

Substack is experiencing a new wave of writer departures to lesser-known rival platforms. Creators are citing increased social features and a pricing model that negatively impacts their businesses. This exodus follows previous talent drain linked to Substack's platforming of Nazi newsletters, indicating broader dissatisfaction beyond content moderation issues.

5.8

#12

#12 Kaku Updated to V0.10 with Optimized In-App Agent Assistant Feature

Kaku has been updated to version V0.10, with a focus on optimizing its in-app Agent assistant feature. This update aims to provide a streamlined and efficient technical partner experience, accessible via Cmd + L.

5.7

#13

#13 GBrain v0.31.1 Ships with MCP Thin Client Support

GBrain v0.31.1 has been released, introducing support for MCP thin clients. This allows users to run a single 'home GBrain server' and connect other devices to it via MCP, offering a near-local performance experience.

5.7

#14

#14 User Feedback on Desired Improvements for the Next Model

Users have expressed a desire for improvements in the next model. Specific areas for enhancement were not detailed in the provided content.

5.6

2026-05-10 Selected 6

#1 NVIDIA Star Elastic: Single Checkpoint Contains 30B, 23B, 12B Reasoning Models

NVIDIA researchers introduced Star Elastic, a post-training method that embeds multiple nested submodels (30B, 23B, and 12B) within a single parent reasoning model, all contained in one checkpoint from a single training run. This approach eliminates separate training, storage, and deployment for each model size. It utilizes importance estimation and a trainable router for architecture selection, supporting various nesting dimensions and enabling distinct models for reasoning and answering phases.

9.6

#2 DHH Praises GPT-5.5 for Capability and Succinctness

DHH reported that GPT-5.5 performed well in low reasoning tasks over the past week, being very efficient and capable. He stated he hasn't been tempted to use Opus and finds GPT-5.5 more succinct than Kimi, calling it a leap forward for OpenAI.

8.6

#3 Scale Confidently with Agent Runtime in Gemini Enterprise Agent Platform

Agent Runtime in Gemini Enterprise Agent Platform is built for speed, featuring sub-second cold starts and rapid provisioning to support complex production workloads, enabling users to scale with confidence.

7.4

#4 The Agent Development Lifecycle

The best organizations have figured out how to ship agents repeatedly, safely, and systematically. They ship early, learn from real usage, and iterate quickly.

7.1

#5 OpenAI's WebRTC Audio Handling Criticized by Luke Curley

Luke Curley from OpenAI highlighted issues with WebRTC's design, which aggressively drops audio packets to maintain low latency. This causes degraded audio quality on conference calls, a trade-off Curley argues is undesirable for LLM applications where prompt accuracy is prioritized over near-instantaneous response.

7.1

#6 Pseudoscientific Emotion AI is Invading the Workplace, an Atlantic Report Shows

According to a report in The Atlantic, software claiming to read human emotions using AI is quietly becoming a fixture of everyday work life.

6.5

2026-05-09 Selected 10

#1 OpenAI Enhances Cyber Trusted Access with GPT-5.5 and GPT-5.5-Cyber

OpenAI has expanded its Trusted Access for Cyber initiative with the introduction of GPT-5.5 and GPT-5.5-Cyber models. These advancements aim to assist verified defenders in accelerating vulnerability research and protecting critical infrastructure.

10.2

#2 OpenAI Codex Updates: New Features Make It Strongest Claude Code Rival Yet

OpenAI has significantly updated Codex, introducing features like computer control, an in-app browser, PR review, and over 90 plugins, positioning it as a strong rival to Claude Code. Testing on a Python codebase showed Codex can fix bugs and write regression tests within minutes. The in-app browser allows direct interaction with GitHub issues for bug fixing, while PR review provides accurate feedback and documentation support. Despite some limitations with computer control on Mac due to security restrictions, the enhanced Codex offers a comprehensive coding assistance experience.

9.0

#3 Meta AI Releases NeuralBench: Unified Framework for NeuroAI Model Benchmarking

Meta AI has released NeuralBench, a unified, open-source framework to standardize the evaluation of NeuroAI models, addressing inconsistencies in preprocessing, datasets, and tasks. The initial release, NeuralBench-EEG v1.0, is the largest of its kind, featuring 36 tasks, 94 datasets, and data from 9,478 subjects, providing a single interface for benchmarking AI models trained on brain activity.

7.7

#4 Robotics: Endgame Talk Released, Sequel to "Physical Turing Test"

The talk "Robotics: Endgame," a sequel to "Physical Turing Test," has been released. It outlines a roadmap for achieving Physical AGI, drawing parallels to the success of LLMs. A YouTube link is provided.

7.5

#5 Tech Weekly: The Third Way of Software Development - 'Mystery House'

This week's tech trends include the 'Mystery House' as a third paradigm in software development, a look at large model popularity rankings, Huawei's headlight projector, and AI pre-screening in hospitals. The issue also features tools like Auge and BleachBit, and resources like 'How LLMs Work'.

7.4

#6 Why Legacy Architectures Fail at Agent Scale

Legacy architectures fail at agent scale due to four key issues: walled gardens, trust gaps, time factors, and cost spirals. The article suggests adopting the Agentic Data Cloud to transform enterprise data into a proactive System of Action for autonomous AI agents.

7.4

#7 Clawvisor Enhances AI Agent Security and App Integration

Clawvisor aims to make the agent world, particularly OpenClaw/Hermes Agent, secure and enterprise-grade. It allows AI agents to access apps like Gmail and Slack without users handing over credentials, approving tasks once and enforcing them.

7.3

#8 Open-Source Agent Cloud Drive Auto-Syncs Memories and Skills

An open-source cloud drive designed for AI Agents has been released. It automatically synchronizes memories, skills, and files for various agents, supporting popular tools like claude, code/codex, and cursor, as well as web applications. A deployed version is also available for direct use.

7.0

#9 Microsoft Feared OpenAI Would Join Amazon and Criticize Azure

Court documents reveal that Microsoft executives worried OpenAI might 'storm off to Amazon' and 'shit-talk' Azure. This concern arose during early discussions about a partnership, shortly after OpenAI demonstrated AI capabilities in gaming.

6.8

#10

#10 AlphaEvolve Accelerates Algorithm Research Across Sciences

AlphaEvolve, a Gemini-powered coding agent, has accelerated progress in algorithms across fields like quantum physics, biotechnology, logistics, and Google AI over the past year.

6.2

2026-05-08 Selected 15

#1 OpenClaw Ushers in the Era of Agentic AI

A confluence of events in late 2025, including the release of Anthropic's Opus 4.5 and OpenAI's GPT 5.2, marked an inflection point for agentic AI in early 2026. This period heralded the dawn of the agentic AI era, with OpenClaw emerging as a key development.

12.1

#2 LightSeek Foundation Releases TokenSpeed, Open-Source LLM Inference Engine for Agentic Workloads

The LightSeek Foundation has released TokenSpeed, an open-source LLM inference engine under the MIT license, engineered for agentic workloads. It aims to match TensorRT-LLM-level performance by optimizing for both high per-GPU TPM and user TPS. TokenSpeed's architecture features a compiler-backed parallelism mechanism, a high-performance scheduler, KV resource reuse restrictions, a pluggable layered kernel system supporting heterogeneous accelerators, and SMG integration. Benchmarks on NVIDIA B200 using SWE-smith traces and the Kimi K2.5 model show TokenSpeed outperforming TensorRT-LLM by approximately 9% for agentic coding workloads above 70 TPS/User.

8.4

#3 Mozilla Hardens Firefox Security Using Claude Mythos Preview

Mozilla detailed how it used a preview of Anthropic's Claude Mythos to find and fix hundreds of vulnerabilities in Firefox. This shift from AI bug reports being 'unwanted slop' is attributed to improved LLM capabilities and Mozilla's refined techniques for harnessing them. The number of security bugs fixed in Firefox jumped from around 20-30 per month in 2025 to 423 in April, including a 20-year-old XSLT bug and a 15-year-old bug in the <legend> element.

7.8

#4 Cloudflare Reduces Workforce by Over 1,100 Amid AI Era Restructuring

Cloudflare is reducing its global workforce by over 1,100 employees, citing fundamental changes in how the company operates due to the rise of AI. The company states this restructuring is to optimize operations and accelerate innovation for the agentic AI era, not a cost-cutting measure or performance assessment. Departing employees will receive significant severance, including pay and healthcare support through the end of 2026.

7.5

#5 How Microsoft Governs Thousands of Kubernetes Clusters at Scale

Microsoft's Azure Kubernetes Fleet Manager addresses the complexity of managing thousands of Kubernetes clusters at scale. It enables teams to group clusters into stages for controlled, sequential rollouts and updates, reducing manual intervention. The solution leverages Cilium Cluster Mesh for seamless cross-cluster connectivity and unified management, essential for distributed workloads like AI.

7.5

#6 Parloa Uses OpenAI Models for AI Customer Service Agents

Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents. This enables enterprises to design, simulate, and deploy reliable, real-time interactions.

7.4

#7 PhysForge Accepted at ICML 2026

PhysForge, a two-stage framework for physics-grounded 3D asset generation proposed by Tencent researchers, has been accepted at ICML 2026. It uses a VLM architect for hierarchical blueprints and diffusion with KineVoxel Injection to create simulation-ready assets, trained on 150K PhysDB.

7.1

#8 OpenAI Launches Official openai-cli Command-Line Tool

OpenAI has released its official command-line tool, openai-cli, enabling developers to call its APIs directly from the terminal without writing SDK code. The project is open-sourced on GitHub (openai/openai-cli) under the Apache 2.0 license and can be installed via Homebrew or Go. This tool supports functionalities such as calling response APIs, generating structured outputs, image generation/editing, transcription, TTS, and managing projects and API keys.

7.0

#9 Claude Code Increases Usage Limits

Claude Code has announced increased usage limits: the 5-hour limit for Claude Code is doubling for Pro, Max, Team, and seat-based Enterprise plans. Peak hour limit reductions are removed for Pro and Max Claude Code usage, and API rate limits for Opus models are substantially raised.

6.9

#10

#10 ByteDance Seed Releases PV-VAE for Video Prediction

ByteDance Seed has released PV-VAE, a predictive Video VAE model that trains on partial context to reconstruct and forecast future frames. The model improves latent diffusability with 52% faster convergence and achieves 34.42 FVD gains over Wan2.2.

6.9

#11

#11 Max Agency Podcast Features Ramp Labs Head of Applied Research Alex Shevchenko

The Max Agency podcast interviewed Alex Shevchenko, Head of Applied Research at Ramp Labs, discussing the construction of Ramp Sheets, their internal Agent Inspect, and more. The interview is available on YouTube, Apple, and Spotify.

6.7

#12

#12 The Astonishing Capabilities of GPT image 2.0

GPT image 2.0, released two weeks ago, continues to reveal surprising new capabilities. Users have discovered its ability to generate text posters and its powerful anime-style image generation. It can even produce images based on named IPs without requiring reference images.

6.6

#13

#13 Deepseek Nears $45 Billion Valuation With China State Chip Fund Leading Round

Chinese AI lab Deepseek is nearing a funding round that could value it at approximately $45 billion, according to the Financial Times. The round is being led by China's state chip fund.

6.6

#14

#14 Mythos Model Confirmed Not Marketing Hype

The Mythos model is confirmed to be non-marketing hype. It's a general-purpose model that's good at finding exploits, and similar models are expected from OpenAI & Google, with open models to follow in 8 months.

6.3

#15

#15 Agent + Seed2.0 lite Automates Video to Blog Post Conversion

Researchers have recreated Andrej Karpathy's two-year-old workflow using Agent and Seed2.0 lite, aiming to automatically convert long videos, such as a 2-hour 13-minute tokenizer tutorial, into blog posts or book chapters.

6.2