
In Today’s Issue:
💸 Anthropic is preparing for a massive IPO
🤖 OpenAI just created a "confession channel" for AI
🧪 Claude 4.5 nearly solved a scientific reproducibility benchmark
🍄 Bryan Johnson's public psilocybin trip caused positive shifts in his system
✨ And more AI goodness…
Dear Readers,
Claude Opus 4.5 wired through Claude Code just hit 95% on CORE-Bench Hard, showing how a smarter scaffold can transform an AI agent from “good” to “research-grade” almost overnight. In today’s issue, we track the ripple effects across the ecosystem: Anthropic eyeing a $300B+ IPO, Bun supercharging Claude’s coding stack, OpenAI testing “confession” channels for more honest models, Mistral’s mixed performance signals, and Bryan Johnson’s mushroom experiment shaking up the longevity crowd, plus Jensen Huang outlining NVIDIA’s next frontier. Dive in and follow the thread that grabs you.
All the best,

Kim Isenberg



Anthropic plans an IPO as early as 2026
Anthropic may go public as early as 2026. It has hired law firm Wilson Sonsini to prepare for an IPO. The startup, backed by Google and Amazon, expects to nearly triple its annual revenue run-rate to around US $26 billion and is reportedly eyeing a valuation north of US $300 billion.

Anthropic acquires Bun as Claude Code reaches $1B milestone
Anthropic has acquired Bun, a hyper-fast JavaScript runtime, just as its coding agent Claude Code hit a $1B run-rate in only six months; Bun will stay open source and continue as an all-in-one toolkit (runtime, package manager, bundler, test runner) with over 7M monthly downloads and 82k+ GitHub stars.

Confessions Boost AI Transparency
In a recent proof-of-concept, OpenAI introduced a “confession” channel for large language models: after producing a regular answer, the model generates a second output admitting whether it broke rules, took shortcuts, or hallucinated, even if the main answer seems fine. It tackles a core problem of AI being a “black box.” By surfacing hidden errors or doubts, this approach improves transparency and makes AI behavior more auditable, a major step toward trustworthy and dependable AI systems.


Joe Rogan Experience with NVIDIA CEO: Jensen Huang



Claude Opus 4.5 Code Cracks CORE-Bench
The Takeaway
👉 Claude Opus 4.5 paired with Claude Code pushes CORE-Bench accuracy from 42% to 95%, effectively solving the benchmark.
👉 The dramatic jump shows that scaffold choice is as important as model choice, with Claude Code nearly doubling performance on its own.
👉 Manual review exposed multiple grading errors and edge cases, confirming that automated evaluation breaks down near the frontier.
👉 With CORE-Bench solved, HAL is shifting toward a new private test set and large-scale real-world reproducibility evaluations, marking a new phase for agent-based scientific workflows.
An AI agent just pulled off the nightmare assignment of every PhD student: it reproduced a big stack of scientific papers, and got almost all of them right. Using Claude Opus 4.5 wired through Claude Code, the HAL team now reports 95% accuracy on CORE-Bench Hard, a benchmark that asks agents to set up real research repos, run the code, and answer questions about the results across CS, social science and medicine.

The wild part: the model itself didn’t change, the scaffold did. Switching from the standardized CORE-Agent scaffold (built on HuggingFace’s smolagent) to Claude Code almost doubled Opus 4.5’s score from 42% to 78%, and fixing grading bugs plus edge cases boosted it to 95%. Stronger agents exposed issues in the benchmark: floating-point nitpicks, underspecified tasks, and “actually-correct” answers that the auto-grader marked wrong, much like earlier cleanup waves on SWE-bench and TauBench.

Now HAL is treating CORE-Bench Hard as effectively solved, rolling out a private follow-up test set and planning large-scale evaluations of real-world scientific repositories.
Why it matters: This is a concrete proof-of-concept that AI agents can tackle end-to-end scientific reproducibility, not just toy Jupyter snippets. It also shows that in the agent era, model + scaffold + grading is the real system boundary—and tiny design choices there can completely rewrite the leaderboard.
Sources:
🔗 https://hal.cs.princeton.edu
🔗 https://github.com/siegelz/core-bench


Modernize Out Of Home with AdQuick
AdQuick unlocks the benefits of Out Of Home (OOH) advertising in a way no one else has. Approaching the problem with eyes to performance, created for marketers and creatives with the engineering excellence you’ve come to expect for the internet.
You can learn more at www.AdQuick.com


About Longevity Today:
All about Bryan Johnson’s Mushroom-trip!
Trip #1
Bryan Johnson is known for conducting extravagant experiments to prolong his life. This time, however, he has dared to do something truly extraordinary: ingesting magic mushrooms to find out how they affect him.
Trip #2
After the first trip overwhelmed him, Johnson wanted to use various biomarkers in the second trip to find out exactly what effect the mushrooms had on him. He was accompanied during this process.
The Results after Trip 2
A single high-dose psilocybin session was followed by sharp biological shifts: inflammation dropped by over 35%, stress hormones fell by ~45%, and estradiol tripled into a neuroprotective, male-normal range. Together, these changes suggest the body temporarily shifted into a low-inflammation, low-stress, longevity-friendly state in the days after the dose.







