The crazy thing about LLMs is nobody even theorized they were possible | AutoAdmit.com

The most prestigious law school admissions discussion board in the world.

Back

Refresh

Options

Favorite

The crazy thing about LLMs is nobody even theorized they were possible

not even the people who built them. they were pretty much cr...

,.,.,.,.,.,.,..,:,,:,,.,:::,.,,.,:.,,.:.,:.,:.::,.

I dunno, seems pretty natural for Jews to just refine a JD i...

AZNgirl asking Othani why he didn't hit 4 homers

people had been training LSTMs for language modeling for yea...

,.,...,.,.,...,.,,.

yes not nobody predicted they’d be capable of what the...

,.,.,.,.,.,.,..,:,,:,,.,:::,.,,.,:.,,.:.,:.,:.::,.

blasting classical music at the welfare office

This isn't really true, what people were surprised by was ho...

blasting classical music at the welfare office

cr, lots of people who thought we needed hand engineered syn...

.,.,,..,..,.,.,:,,:,...,:::,...,:,.,.:..:.

It doesn’t matter bc the scaling is what leads to the ...

Jew in Wolfs clothing

Yeah I agree. LLMs are already pretty close to maxxed out. T...

blasting classical music at the welfare office

seems premature to conclude that given the progress we have ...

,.,...,.,.,...,.,,.

I don’t think their potential is maxed out since there...

,.,.,.,.,.,.,..,:,,:,,.,:::,.,,.,:.,,.:.,:.,:.::,.

It is absolutely far fetched to imagine that LLMs can make n...

blasting classical music at the welfare office

The double descent phenomenon overcoming overfitting wasn't ...

explain what this is like I’m an Eric Trump-level reta...

,.,.,.,.,.,.,..,:,,:,,.,:::,.,,.,:.,,.:.,:.,:.::,.

classical statistical inference models are highly focused on...

,.,...,.,.,...,.,,.

Poast new message in this thread

Favorite

Date: November 2nd, 2025 12:39 PM
Author: ,.,.,.,.,.,.,..,:,,:,,.,:::,.,,.,:.,,.:.,:.,:.::,.

not even the people who built them. they were pretty much created by accident

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395317)

Favorite

Date: November 2nd, 2025 12:43 PM
Author: AZNgirl asking Othani why he didn't hit 4 homers

I dunno, seems pretty natural for Jews to just refine a JD into an LLM and sell it for full tuition to foreigners

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395330)

Favorite

Date: November 2nd, 2025 1:04 PM
Author: ,.,...,.,.,...,.,,.

people had been training LSTMs for language modeling for years before GPT-1 was released, so that's not true. the idea that training for predictive loss on text gradually produces language fluency and understanding makes deep intuitive sense if you think about it for two seconds. many people in ML were surprised and disappointed that the architectures for doing this could look stupidly simple and all you needed to do was train at scale (and everything not based on scale and instead focused on clever model design failed). certainly GPT-3 was not surprising at all given GPT-2

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395372)

Favorite

Date: November 2nd, 2025 1:20 PM
Author: ,.,.,.,.,.,.,..,:,,:,,.,:::,.,,.,:.,,.:.,:.,:.::,.

yes not nobody predicted they’d be capable of what they do. they didn’t understand the power of machine learning, and they still don’t even understand how they work. they were basically trying to create a language assistant you could incorporate into Microsoft Word and what they stumbled upon is the most valuable technology humans ever created

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395411)

Favorite

Date: November 2nd, 2025 1:25 PM
Author: blasting classical music at the welfare office

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395431)

Favorite

Date: November 2nd, 2025 1:25 PM
Author: blasting classical music at the welfare office

This isn't really true, what people were surprised by was how powerful scaling was, to the extent that scaling alone unlocked qualitatively novel capabilities

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395430)

Favorite

Date: November 2nd, 2025 1:38 PM
Author: .,.,,..,..,.,.,:,,:,...,:::,...,:,.,.:..:.

cr, lots of people who thought we needed hand engineered syntactic processing. Some of them are still deluded enough to believe their precious cognitive theories will be worth something in the long run.

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395459)

Favorite

Date: November 2nd, 2025 1:40 PM
Author: Jew in Wolfs clothing

It doesn’t matter bc the scaling is what leads to the hallucinatory effect of the LLM anyway, a pure ML silo is the only true enterprise solution

LLM is great for ensnaring proles into further retardation and less sovereign cognition and Jewish rule which is 18800000000000

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395460)

Favorite

Date: November 2nd, 2025 1:42 PM
Author: blasting classical music at the welfare office

Yeah I agree. LLMs are already pretty close to maxxed out. The only path to real AI is irl environmental reinforcement learning or somehow recreating irl closely enough in a digital environment like you said (I really doubt this is possible in practice)

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395468)

Favorite

Date: November 2nd, 2025 2:36 PM
Author: ,.,...,.,.,...,.,,.

seems premature to conclude that given the progress we have seen in the last year. LLMs within the last year got gold at the imo, gold at the international informatics olympiad, gold at international collegiate programming contest, saw rapid growth in math and software engineering benchmark performance, etc. GPT-5 hallucination rates are also down compared to prior models. people now have weird expectations for this technology given the rate of improvement - 3 or 4 months without sizeable improvements means it has "plateaued". there's now a long history of people being wrong about this.

compute based models of AI progress predicted recent history well. if you project those models out to the sort of datacenters and chips that are currently being built, it's very hard to buy into notions of AI progress meaningfully stalling. compute buys algorithmic improvements too, since model designers can rapidly test out and iterate over architectures, so model quality per unit of compute cost is likely to continue to increase.

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395596)

Favorite

Date: November 2nd, 2025 2:40 PM
Author: ,.,.,.,.,.,.,..,:,,:,,.,:::,.,,.,:.,,.:.,:.,:.::,.

I don’t think their potential is maxed out since there are structures in the data that humans aren’t aware of. connecting those dots is how AI was able to discover new chess moves and strategies that were previously unknown to humans. It’s easy to imagine an AI trained on nothing but text inferring solutions to problems that humans haven’t figured out. For example it’s not far fetched to think a super advanced AI could infer a way to unify quantum theory and relativity, similar to how Einstein inferred relativity based on what was known at the time as a way to solve discrepancies between Newtonian Mechanics and Maxwell’s equations. It will probably solve many smaller problems too along the way before it gets to the big fish.

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395605)

Favorite

Date: November 2nd, 2025 2:58 PM
Author: blasting classical music at the welfare office

It is absolutely far fetched to imagine that LLMs can make novel discoveries in physics. LLMs cannot do this. They cannot make out of distribution inferences

The reason why AI was able to discover novel chess strategy and understanding is because that form of AI is reinforcement learning and not LLMs. Alphazero and similar chess engines are training against themselves, creating novel data, and testing against it for feedback, resulting in the synthesis of genuinely new "understanding" that is entirely outside of the distribution of known human chess understanding and experience

LLMs don't work like that and aren't capable of that. They will get better, yes, and people will figure out clever ways to set up agentic superstructures to enable them to do practically useful tasks that are very economically efficient and productive, but LLMs are absolutely not *ever* going to be capable of novel inference. At best they can provide assistance and facilitation of the training of other RL based AIs that can

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395642)

Favorite

Date: November 2nd, 2025 2:41 PM
Author: beautiful kike

The double descent phenomenon overcoming overfitting wasn't predicted and really surprised researchers.

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395606)

Favorite

Date: November 2nd, 2025 2:43 PM
Author: ,.,.,.,.,.,.,..,:,,:,,.,:::,.,,.,:.,,.:.,:.,:.::,.

explain what this is like I’m an Eric Trump-level retard

edit: nvm. I looked this up and know what it is but didn’t know that’s what it’s called

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395611)

Favorite

Date: November 2nd, 2025 2:59 PM
Author: ,.,...,.,.,...,.,,.

classical statistical inference models are highly focused on preventing something called overfitting.

imagine you have a data sample that you are trying to fit a function to that will predict new datapoints. if you have a function approximator (such as a neural network) that has many tunable parameters, then you can often fit many, many functions to your data. most of them will have very little ability to predict data points that aren't in your sample, so you end up with an overfitted function that models your data well but is essentially useless. most people predicted this as the standard failure model of neural networks that are either 1) very large, with many tunable parameters (more opportunity to fit a complicated function with no generalization power 2) trained very hard on the data (so the model will fit the data too closely and not extract broadly useful patterns). they thought you would then need to keep the number of parameters suitably low and stop model training after a certain point to prevent overfitting.

it turns that if you just continue making neural networks larger or training for very long time horizons, you'll often see model generalization error go to a new, lower level. this is deeply counter intuitive for standard statistical models. why this happens is not completely understood. weight decay, which penalizes model complexity, is likely a big part of this. the function approximator will wander around the loss landscape for a while and then eventually find a simple encoding function for the data. simple models have lower generalization error.

(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2)#49395644)