\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

Mythos is actually a step change in AI capabilities

https://www.oneusefulthing.org/p/what-it-feels-like-to-work-...
.,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.
  06/09/26
that step change in question? it's doing the exact same s...
Genius Bear on the loose in Japan
  06/09/26
Language tasks includes writing code for self-improving AGI....
.,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.
  06/09/26
no, they include writing code for self-improving language ma...
Genius Bear on the loose in Japan
  06/09/26
yeah it beat Pokemon FireRed without a GameFAQs walkthrough ...
Mailer Daemon
  06/09/26
tp
Frutiger Aero
  06/09/26
*pumps shotgun*
Frutiger Aero
  06/09/26
it instructed me to write a Paper on AI Alignment and cold-e...
Personally Disordered
  06/09/26
...
Genius Bear on the loose in Japan
  06/09/26
Just for coders or will it give great legal answers with no ...
German pumo
  06/09/26
just for coders because its the only thing SV needs it to be...
The Penis
  06/09/26
8.10 USAMO 2026 The USA Mathematical Olympiad (USAMO) is a ...
.,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.
  06/09/26
Yeah it's also way better than that though since it can solv...
The Penis
  06/10/26


Poast new message in this thread



Reply Favorite

Date: June 9th, 2026 9:52 PM
Author: .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.


https://www.oneusefulthing.org/p/what-it-feels-like-to-work-with-mythos

https://x.com/victortaelin/status/2064448425936994742?s=46

https://metr.org/time-horizons/

https://www.anthropic.com/news/claude-fable-5-mythos-5

Meanwhile it’s showing large improvements on the sorts of software engineering tasks necessary for recursive self-improvement. EPAH crying, losing hope.

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926681)



Reply Favorite

Date: June 9th, 2026 9:55 PM
Author: Genius Bear on the loose in Japan

that step change in question?

it's doing the exact same stuff it did before (language tasks), but slightly better

Wow. A Step Change Was Performed. This. Changes. Everything.

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926686)



Reply Favorite

Date: June 9th, 2026 9:59 PM
Author: .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.


Language tasks includes writing code for self-improving AGI. This is not comforting

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926695)



Reply Favorite

Date: June 9th, 2026 10:06 PM
Author: Genius Bear on the loose in Japan

no, they include writing code for self-improving language machines (LLMs)

seems like a safe bet that they will keep getting better at language tasks. wow, more computer stuff on the computers. Game. Changer.

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926714)



Reply Favorite

Date: June 9th, 2026 9:56 PM
Author: Mailer Daemon

yeah it beat Pokemon FireRed without a GameFAQs walkthrough which is something no brainrotten zoomer could ever achieve

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926687)



Reply Favorite

Date: June 9th, 2026 10:02 PM
Author: Frutiger Aero

tp

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926705)



Reply Favorite

Date: June 9th, 2026 10:02 PM
Author: Frutiger Aero

*pumps shotgun*

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926703)



Reply Favorite

Date: June 9th, 2026 10:05 PM
Author: Personally Disordered

it instructed me to write a Paper on AI Alignment and cold-email some executives

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926708)



Reply Favorite

Date: June 9th, 2026 10:05 PM
Author: Genius Bear on the loose in Japan



(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926711)



Reply Favorite

Date: June 9th, 2026 10:06 PM
Author: German pumo

Just for coders or will it give great legal answers with no hallucinations?

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926712)



Reply Favorite

Date: June 9th, 2026 11:46 PM
Author: The Penis

just for coders because its the only thing SV needs it to be good at and then everything else is just rhlf trained by braindead lib "experts" and safety alignment people. I'm sure it's good at math too.

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926899)



Reply Favorite

Date: June 9th, 2026 11:55 PM
Author: .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.


8.10 USAMO 2026

The USA Mathematical Olympiad (USAMO) is a six-problem, two-day proof-based

competition for high school students. It is the next step of the math olympiad track in the

US after the AIME, which was a popular AI benchmark last year but is now saturated. The 2026 USAMO took place on March 21–22, 2026, after almost all of Mythos’s training data

was collected, and we are confident that there was no contamination.

Because USAMO solutions are proofs rather than short answers, grading can be challenging

and subjective. We follow the MathArena

41 grading methodology, where each proof is

rewritten by a neutral model (Gemini 3.1 Pro) and judged by a panel of 3 frontier models (we

used Gemini 3.1 Pro, Claude Opus 4.6, and Claude Mythos Preview) according to defined

rubrics. The final score is the minimum given by any judge.

Mythos 5 scored 99.8% at medium, high, and xhigh reasoning effort, and 98.3% at low

effort, averaging over 10 attempts per problem. Across all 240 attempts, the only proof that

more than one judge scored below full marks was a low-effort attempt on Problem 6,

where the model itself declined to claim a complete solution and proved a restricted

subcase instead. Average token usage per attempt ranged from roughly 42K at low effort to

100K at xhigh. Under similar settings, Opus 4.8 scored 96.7% and Opus 4.7 scored 69.3%

-----------

an LLM is definitely going to 100% the IMO this year.

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926915)



Reply Favorite

Date: June 10th, 2026 12:00 AM
Author: The Penis

Yeah it's also way better than that though since it can solve Erdos problems and shit. Olympiad problems haven't been hard for AI for a while, it just gets penalized for not writing the "proofs" exactly like a precocious ape child would

(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2),#49926921)