ChatGPT o1 qualifies for USAMO. People continue to act like this is normal | AutoAdmit.com

The most prestigious law school admissions discussion board in the world.

Back

Refresh

Options

Favorite

ChatGPT o1 qualifies for USAMO. People continue to act like this is normal

the amount of cope going on with AI is unbelievable anymore....

\'\"\'\'\"\'\"\'\"\'\'\'

It has read terabytes of data, including almost certainly so...

"I tested it on a few obscure board and card games and ...

Ok, but 4o can’t do AIME problems nearly as well. If t...

\'\"\'\'\"\'\"\'\"\'\'\'

This is obviously correct. The whole "reasoning"...

How are you confident this doesn’t replicate the resul...

\'\"\'\'\"\'\"\'\"\'\'\'

what do you mean replicate? maybe "reasoning" is j...

statistically probable next token generator means nothing. A...

\'\"\'\'\"\'\"\'\"\'\'\'

You're saying a CSV file containing decimal node weights &qu...

It would sound pretty stupid too if someone claimed that syn...

\'\"\'\'\"\'\"\'\"\'\'\'

It's missing THE INDEFINABLE SPARK OF HUMAN INGENUITY *poa...

I don't notice too much of a difference with 4o.

,.,....,.,.,.,...,.,.,...,.,.,.,,

Poast new message in this thread

Favorite

Date: September 21st, 2024 7:54 PM
Author: \'\"\'\'\"\'\"\'\"\'\'\'

the amount of cope going on with AI is unbelievable anymore. There are still serious people claiming it doesn’t understand anything simply because there are gaps of understanding. Never mind that it accurately describes almost all visual scenes. Never mind that it accurately summarizes, makes reasonable inferences and shows clear indications of having developed a world model. Now it gets 13 on the AIME and it is no big deal. Most people are idiots in all sorts of ways that ChatGPT isn’t. It’s clear to me now that these critics will be in denial all the way until AGI.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118023)

Favorite

Date: September 21st, 2024 8:50 PM
Author: Kefka

It has read terabytes of data, including almost certainly solutions to past AMCs, AIMEs, USAMOs, Putnams, etc.

There is something going on in large language models, but it's pattern matching against more text than any one person can read, not necessarily learning problem solving skills outside of written exams.

I tested it on a few obscure board and card games and it was still poor. It has to have hundreds of examples that "look like" what it's supposed to be doing.

Interesting? Yes. AGI? No.

--Someone who got 13 on the AIME in the late 1990s.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118199)

Favorite

Date: September 21st, 2024 8:58 PM
Author: Risten

"I tested it on a few obscure board and card games and it was still poor"
Exactly the cope mentioned in op. Yeah it probably just read the answers to those AIME questions somewhere else on the internet. And the AI that beat humans' ass at go probably read its moves somewhere on the internet.
Look maybe AI won't kill us all (or worse) but it's real retarded at this point to pretend it's just a really fast search engine.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118228)

Favorite

Date: September 21st, 2024 9:10 PM
Author: \'\"\'\'\"\'\"\'\"\'\'\'

Ok, but 4o can’t do AIME problems nearly as well. If this was simply a matter of the solutions being memorized (and AIME solutions would be a small part of its training set), then you would expect similar performance between the 4o model and a 4o that was trained with RL and chain of thought prompting. Memorization is also not consistent with the fact performance improves neatly as a function of number of generations. I tried some non-trivial actuarial problems that are almost certainly not in its training corpus and it got 2 out of 3 right and the problem with the one it got wrong was pretty understandable.

This sort of argument - “it’s just copying from the training set!” is looking more and more strained. I think people are setting themselves up to be extremely surprised in 2 to 10 years when all the denials about what is occurring come crashing down.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118278)

Favorite

Date: September 21st, 2024 9:11 PM
Author: Zoobooks

This is obviously correct.

The whole "reasoning" thing is clearly just iterating the LLM against its own answers a bunch of times and maybe asking itself questions about its answer. that's what they mean by adding compute to the prompt stage. this isn't interesting at all even if the answers will appear better and better.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118280)

Favorite

Date: September 21st, 2024 9:16 PM
Author: \'\"\'\'\"\'\"\'\"\'\'\'

How are you confident this doesn’t replicate the results of human reasoning?

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118298)

Favorite

Date: September 21st, 2024 9:18 PM
Author: Zoobooks

what do you mean replicate? maybe "reasoning" is just iterating thinking over the outputs of thinking and asking ourselves questions. but it's an LLM. it's still just a statistically probable next token generator.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118305)

Favorite

Date: September 21st, 2024 9:36 PM
Author: \'\"\'\'\"\'\"\'\"\'\'\'

statistically probable next token generator means nothing. A model trained to perfectly predict the next token has to learn the underlying generative process. A small training sample only allows the learning of superficial patterns because the model has many parameters and there are many possible functions that could be fitted to it, but neural networks are biased to simple functions (approximating Occam’s razor) and if they are optimized on large amounts of data they start to accurately model human thoughts.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118364)

Favorite

Date: September 21st, 2024 9:43 PM
Author: Zoobooks

You're saying a CSV file containing decimal node weights "accurately model[s] human thoughts"?

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118379)

Favorite

Date: September 21st, 2024 9:56 PM
Author: \'\"\'\'\"\'\"\'\"\'\'\'

It would sound pretty stupid too if someone claimed that synaptic connection strengths in brains modeled reality.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118419)

Favorite

Date: September 21st, 2024 9:19 PM
Author: Risten

It's missing THE INDEFINABLE SPARK OF HUMAN INGENUITY
*poasts 'im gay' for next 10 hours before jacking off to deranged porn and passing out*

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118308)

Favorite

Date: September 21st, 2024 8:40 PM
Author: ,.,....,.,.,.,...,.,.,...,.,.,.,,

I don't notice too much of a difference with 4o.

(http://www.autoadmit.com/thread.php?thread_id=5599365&forum_id=2#48118172)