Gary Marcus reply to SSC on scaling hypothesis | AutoAdmit.com

The most prestigious law school admissions discussion board in the world.

Back

Refresh

Options

Favorite

Gary Marcus reply to SSC on scaling hypothesis

https://garymarcus.substack.com/p/does-ai-really-need-a-para...

Big sickened site

>They found that at least a 5-layer neural network is nee...

talented business firm

Not a "math guy" but isn't that correct?

that's what I'm saying edit: turns out I'm a retard thoug...

talented business firm

the idea of an artificial intelligence is less threatening i...

flirting sooty church

i would be willing to bet that GPT-4 is nowhere near 100 tri...

Swashbuckling dilemma

Big sickened site

speaking of scale is all you need. did you see Google's new ...

Swashbuckling dilemma

I did but ty for linking, it embiggens the spirit.

Big sickened site

it will be interesting to see how they intend to use that. t...

Swashbuckling dilemma

gary marcus is a fraud. a friend of mine was his doctoral...

white sex offender

I thought he was a linguist or something. Anyway, he links t...

Big sickened site

he was a psycholinguist, yes. but that doesn't mean much. a...

white sex offender

I'm aware of critiques of Marcus and the general reputation ...

Big sickened site

do you read lesswrong? the ML discussion (not alignment rela...

I have in the past but it's cultic. Hard to ignore the farci...

Big sickened site

lesswrong is a strange place but it does encourage intellect...

Swashbuckling dilemma

See, that's the issue with LessWrong: it's built around a ra...

Big sickened site

there are people there that totally buy into the Eliezer koo...

Swashbuckling dilemma

Thanks. I'm puzzled by your rebuttal re religion, given that...

Big sickened site

so i think it's different because religious predictions of d...

Swashbuckling dilemma

I don't think that's quite true re salvation and the afterli...

Big sickened site

how did he get Uber to buy his company

Swashbuckling dilemma

don't know but that ended as disastrously as it would under ...

white sex offender

Geriatric curious plaza

A challenge for AI is that much of human society, and hence ...

Big sickened site

[also, fairly unrelated & more speculative but check thi...

Big sickened site

this is worth listening to: https://theinsideview.ai/raph...

Swashbuckling dilemma

Many thanks! this looks great. I am pasting the transcript b...

Big sickened site

“Gary Marcus doesn’t contribute to the field&rdq...

stirring brilliant background story

(Jerry Fodor). Jokes aside yes, I think it would be bette...

Big sickened site

Ok I read it and I agree with LeCun and his list. The proble...

stirring brilliant background story

TY for your response. But query, what if Chomsky & co...

Big sickened site

Yes ultimately it’s empirical but I think the weight o...

stirring brilliant background story

There's a lot of interesting work on this that goes far beyo...

Big sickened site

Yes, Fristonian “predictive processing” would im...

stirring brilliant background story

How much of the modularity and fixed structure is the result...

Swashbuckling dilemma

Re: the fixity of neuroanatomy, the data are (even now) frus...

Big sickened site

"The problem with “learning like a baby” is...

I'm sympathetic to this view, but see https://psyarxiv.com/5...

Big sickened site

it would be interesting to know whether they are looking at ...

Swashbuckling dilemma

>transformers seem to understand an awful lot about human...

stirring brilliant background story

"3) model other agents and their internal states" ...

Big sickened site

https://youtu.be/q9XR9Wbl7iI?t=870 Watch till minute 17. ...

stirring brilliant background story

Language models don't see words (at least GPT-3 doesn't and ...

Swashbuckling dilemma

https://ai.googleblog.com/2022/06/minerva-solving-quantitati...

Swashbuckling dilemma

Big sickened site

It's not just about model architecture but the entire framew...

stirring brilliant background story

i think you are right in a certain sense. one of the weird t...

Swashbuckling dilemma

Humans can generate their own training data since they can f...

Big sickened site

LMs can do that as well (someone at Google Brain or DeepMind...

Swashbuckling dilemma

We need to hook these things up to robot bodies and let do I...

Big sickened site

The telos of language isn’t to predict or generate lan...

stirring brilliant background story

i'm not sure what point this is trying to make. a language m...

Swashbuckling dilemma

I don’t see a world model in LMs, only linguistic corr...

stirring brilliant background story

i think knowing about the properties of things and how they ...

Swashbuckling dilemma

Yes it seems like something that has a world model. I gue...

stirring brilliant background story

ok, i can understand that. right now there are separate rese...

Swashbuckling dilemma

Big sickened site

i saw this recent post on Marcus and think it's about right:...

Swashbuckling dilemma

Ty. Intellectual honesty of this sort is unforgivable, in a ...

Big sickened site

I don’t personally like Marcus thanks to his basic bit...

stirring brilliant background story

that's the problem. i can't see what his core argument is. ...

Swashbuckling dilemma

https://www.youtube.com/watch?v=Gfr50f6ZBvo he mentions i...

Swashbuckling dilemma

Very exciting on both counts.

Big sickened site

https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla...

Swashbuckling dilemma

Ty. This would seem to validate Judea Pearl's point about co...

Big sickened site

there's still a large potential to remedy those problems wit...

Swashbuckling dilemma

Big sickened site

Video is better than static images but it's still passive. I...

stirring brilliant background story

That is why we need to just unleash bots equipped with basic...

Big sickened site

Or unleash them in simulated environments like Minecraft. ...

stirring brilliant background story

Hmmm. Can you expand on what you mean by "infinite"...

Big sickened site

In the theory of automata and formal languages you have a fe...

stirring brilliant background story

Big sickened site

what do you think about this research? i thought it was inte...

Swashbuckling dilemma

i agree that allowing the model to interact with the real wo...

Swashbuckling dilemma

more discussion about this: https://www.lesswrong.com/pos...

Swashbuckling dilemma

Ty, I'll review with interest.

Big sickened site

Based on that game, it seems like there are statistical prop...

Google Is Making LaMDA Available https://thealgorithmicbr...

Big sickened site

really feel like these chatbots released by Google and Faceb...

Swashbuckling dilemma

There will be open source free versions just as good or clos...

Big sickened site

https://www.deepmind.com/blog/muzero-mastering-go-chess-shog...

Big sickened site

did you see EfficientZero? it's similar to MuZero but achiev...

Swashbuckling dilemma

I had but hadn't seen the gwern writeup, thanks!

Big sickened site

DRL seems like it's in the same situation as language models...

Swashbuckling dilemma

Big sickened site

Nice GM piece on compositionality.

Big sickened site

https://astralcodexten.substack.com/p/i-won-my-three-year-ai...

Swashbuckling dilemma

https://garymarcus.substack.com/p/did-googleai-just-snooker-...

Big sickened site

At this point, i'll admit i have a bad attitude about this g...

Swashbuckling dilemma

I think he had the better of this exchange.Scott is getting ...

Big sickened site

https://deepimagination.cc/eDiff-I/ Would be interesting ...

Swashbuckling dilemma

John Carmack https://www.reddit.com/r/slatestarcodex/comment...

Big sickened site

his interview with Lex Fridman is interesting but weird. 50%...

Swashbuckling dilemma

scale coming for the typical office job: https://www.adep...

Swashbuckling dilemma

Signed up for alpha list. 190

Big sickened site

i thought this was interesting. i think it lends support to ...

Swashbuckling dilemma

I have been self-radicalizing about the necessity of truly o...

Big sickened site

HuggingFace and Eleuther

stirring brilliant background story

I know, I am on it. But we need to get the parameters way up...

Big sickened site

the huge ones won't be deployable on consumer hardware. way ...

Swashbuckling dilemma

Anybody can rent out space on an insane rig tho, no? And yea...

Big sickened site

that's true. not sure how expensive it is to find something ...

Swashbuckling dilemma

https://github.com/THUDM/GLM-130B Seems like this can be ...

Swashbuckling dilemma

someone needs to hit it with a hammer

seems relevant to predicting GPT-4 capabilities. from Sam Al...

Swashbuckling dilemma

Holy shit. https://phenaki.video/

Big sickened site

https://makeavideo.studio/?fbclid=IwAR1D2vQR1BI598JxE_2jKHis...

Swashbuckling dilemma

lots of diffusion results recently but this is the most impo...

Swashbuckling dilemma

Interesting, though it's difficult for me to evaluate if it ...

Big sickened site

https://imagen.research.google/video/

Swashbuckling dilemma

Most remarkable is that the architecture is nearly identical...

stirring brilliant background story

Turns out petumkin was right about SCALE.

Big sickened site

Who is petumkin?

stirring brilliant background story

It's irrelevant, a joke, nevermind. On topic yeah — pr...

Big sickened site

Lots of cool stuff with current phase of development. But I ...

stirring brilliant background story

Big sickened site

can't believe how fast this stuff is moving https://ai.go...

Swashbuckling dilemma

180. This is going to be so amazing for fraud.

Big sickened site

Still waiting for high quality music generation...

Swashbuckling dilemma

Big sickened site

new locklin on science on conformal prediction / aggregating...

Big sickened site

i thought this LW post was good and matches my own intuition...

Swashbuckling dilemma

I found it poorly-reasoned and unconvincing. I'm just not so...

Big sickened site

i was more thinking about the capabilities argument. transfo...

Swashbuckling dilemma

I just don't think alignment is an issue. I was convinced by...

Big sickened site

i'm not sure i buy either position in the foom debate. i don...

Swashbuckling dilemma

To be clear, my critique is largely about social effects and...

Big sickened site

i'm not concerned about GATO being general. it's more of a p...

Swashbuckling dilemma

I suppose time will tell. Our main difference, if there is o...

Big sickened site

even though i'm concerned, i actually don't have a strong in...

Swashbuckling dilemma

Big sickened site

A test of AI conceptual abilities could proceed as follows &...

Big sickened site

yeah, current language models would be terrible at that. the...

Swashbuckling dilemma

I defer to Valiant but see https://en.wikipedia.org/wiki/Sym...

Big sickened site

I thought this was pretty neat: https://mind-vis.github.i...

Swashbuckling dilemma

Big sickened site

"Standford University, Vision and Learning Lab"

white sex offender

Swashbuckling dilemma

https://www.technologyreview.com/2022/11/18/1063487/meta-lar...

Swashbuckling dilemma

We need to stop coddling these worthless crybabies. They sho...

Big sickened site

cr. ljl at the idea some retarded language model is "da...

Swashbuckling dilemma

Big sickened site

Gpt4 in Dec or Jan? Also, see https://towardsdatascience....

Big sickened site

given up on predicting this. definitely thought it would com...

Swashbuckling dilemma

Big sickened site

The poem generation is pretty 180 now

Swashbuckling dilemma

The Earth decays in the aftermath of war, And ever since A...

Big sickened site

Swashbuckling dilemma

red parlor really tough guy

Sorry if this was poasted before but man is this stuff inter...

Big sickened site

any updated thoughts in the wake of GPT-4? i doubt this will...

Swashbuckling dilemma

It seems like an incremental improvement over GPT4 and, for ...

Big sickened site

yeah, that's my impression of it as well. it would be worth ...

Swashbuckling dilemma

Big sickened site

I sometimes wonder about what exactly the pro scaling positi...

Swashbuckling dilemma

https://futureoflife.org/open-letter/pause-giant-ai-experime...

Swashbuckling dilemma

He's a faggy lib.

Big sickened site

https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-...

Swashbuckling dilemma

Soooooo cr. Can't wait.

Big sickened site

Gemini having basically GPT 4 level performance seems like s...

Swashbuckling dilemma

Poast new message in this thread

Favorite

Date: June 21st, 2022 2:39 AM
Author: Big sickened site

https://garymarcus.substack.com/p/does-ai-really-need-a-paradigm-shift

Also a note re GPT-4
https://towardsdatascience.com/gpt-4-will-have-100-trillion-parameters-500x-the-size-of-gpt-3-582b98d82253

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44715944)

Favorite

Date: June 22nd, 2022 9:52 PM
Author: talented business firm

>They found that at least a 5-layer neural network is needed to simulate the behavior of a single biological neuron. That’s around 1000 artificial neurons for each biological neuron.

>human brain has 100 billion neurons

>GPT-4 will have 100 trillion neurons

interesting math

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726205)

Favorite

Date: June 22nd, 2022 9:55 PM
Author: claret rehab

Not a "math guy" but isn't that correct?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726208)

Favorite

Date: June 22nd, 2022 9:56 PM
Author: talented business firm

that's what I'm saying

edit: turns out I'm a retard though because "parameters" are not "neurons" (it's the weight matrix). number of neurons is fewer.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726212)

Favorite

Date: June 22nd, 2022 10:18 PM
Author: flirting sooty church

the idea of an artificial intelligence is less threatening if these are the organic “intelligences” hard at work behind the curtain.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726285)

Favorite

Date: June 22nd, 2022 10:19 PM
Author: Swashbuckling dilemma

i would be willing to bet that GPT-4 is nowhere near 100 trillion parameters. i would be shocked if it's above 15 trillion parameters.

there are several reasons for this. a dense transformer of that size would cost tens or hundreds of billions of dollars to train. no indication of budgets of that size yet.

the Chinchilla scaling laws also imply that GPT-3 was very inefficiently trained. they should have trained a smaller sized model on many more tokens. Chinchilla is better than GPT-3 with many fewer parameters because it used more data. i would be surprised if OpenAI didn't realize that before Chinchilla was released, which means they will be pumping much more of their computing power into running more data through the network instead of just increasing parameter count. a 1 trillion parameter GPT trained according to the Chinchilla scaling laws would much better than even PaLM.

there are also other avenues for improvement that they haven't explored besides parameter scaling. recurrent transformer architectures with larger context windows. language models that learn how to plan (imagine a MuZero style training for GPT-3, where it bootstraps itself into learning how to plan how to write good text).

there are tons of avenues for improvement that would likely be more efficient than just making GPT-4 gigantic.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726290)

Favorite

Date: June 22nd, 2022 10:24 PM
Author: Big sickened site

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726330)

Favorite

Date: June 22nd, 2022 10:26 PM
Author: Swashbuckling dilemma

speaking of scale is all you need. did you see Google's new image model

https://parti.research.google/

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726350)

Favorite

Date: June 22nd, 2022 10:33 PM
Author: Big sickened site

I did but ty for linking, it embiggens the spirit.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726387)

Favorite

Date: June 22nd, 2022 10:48 PM
Author: Swashbuckling dilemma

it will be interesting to see how they intend to use that. that was surely expensive to train so i think they intend to turn it into some product.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726477)

Favorite

Date: June 22nd, 2022 10:49 PM
Author: white sex offender

gary marcus is a fraud.

a friend of mine was his doctoral student. and i have a PhD in ML. he has never done any meaningful research in the field.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726485)

Favorite

Date: June 22nd, 2022 10:50 PM
Author: Big sickened site

I thought he was a linguist or something. Anyway, he links to various sources after his initial woke fusillade.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726493)

Favorite

Date: June 22nd, 2022 10:52 PM
Author: white sex offender

he was a psycholinguist, yes. but that doesn't mean much. a lot of the best researchers, including geoff hinton and michael i. jordan (who is the most cited 'computer scientist' in history) come from psych/cognitive science departments.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726501)

Favorite

Date: June 22nd, 2022 10:55 PM
Author: Big sickened site

I'm aware of critiques of Marcus and the general reputation and provenance of Michael Jordan (if only due to the unusual confluence of names), but I appreciate you taking the time to inform of them. Any chance you can recc better sources to read?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726521)

Favorite

Date: June 23rd, 2022 12:59 AM
Author: Razzle space

do you read lesswrong? the ML discussion (not alignment related, more of the capabilities side) there is often pretty high quality. some of the posters are actual researchers from places like DeepMind or OpenAI. even those who aren't are usually well read. i think most of that community believes in the scaling hypothesis.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726944)

Favorite

Date: June 23rd, 2022 6:26 AM
Author: Big sickened site

I have in the past but it's cultic. Hard to ignore the farcical alignment side and neckbeardism. Hm.

Ty for the response.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44727341)

Favorite

Date: June 23rd, 2022 9:47 AM
Author: Swashbuckling dilemma

lesswrong is a strange place but it does encourage intellectual honesty and people at least don't dismiss ideas just for being "weird." even in ML circles the scaling hypothesis is deeply unpopular, possibly because of the implications. if your model of ML progress basically predicts that society is going to be run by machines in 3 to 20 years, odds are you will find a reason to not believe in it.

https://www.lesswrong.com/posts/9Yc7Pp7szcjPgPsjf/the-brain-as-a-universal-learning-machine

https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44727822)

Favorite

Date: June 23rd, 2022 9:54 AM
Author: Big sickened site

See, that's the issue with LessWrong: it's built around a rabbinical cult leader with silly ideas about the social impact of machines that he uses as a substitute for religion. The scaling hypothesis says nothing about society being run by machines, and it would be nice to be able to enjoy a technical discussion of it without the dire prognostications.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44727847)

Favorite

Date: June 23rd, 2022 10:08 AM
Author: Swashbuckling dilemma

there are people there that totally buy into the Eliezer kool aid, but you'll find a lot of pushback against him. here is a recent thread from a former OpenAI researcher that criticizes some of his views.

https://www.lesswrong.com/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer

i'm not so sure the religious argument works in their case. it makes more sense if you think about the Kurzweil types that envision positive outcomes. most of that community is pessimistic about the prospect of alignment and doesn't find any comfort in these ideas.

the scaling hypothesis doesn't directly say society is going to be run by machines in a few years, but if you think simple neural circuitry trained at the size of the brain will yield human level cognition, i think it's very likely AGI is near. i can sort of imagine scenarios in which that doesn't happen (no one is willing to invest billions to do very large scale training runs), but they seem somewhat far fetched.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44727922)

Favorite

Date: June 23rd, 2022 10:51 AM
Author: Big sickened site

Thanks. I'm puzzled by your rebuttal re religion, given that gloom and doom millenarian predictions are a religious mainstay. I don't think we have a real disagreement.

As for AGI, I think suprahuman intelligence has already been achieved in some areas and will continue to be achieved in others, sometimes with synergies. I don't think we'll encounter a FOOM scenario for substantially the reasons Hanson has laid out. I'm unconcerned about alignment since if you accept the premises then there's nothing to be done about it, and because I don't accept the premises in that computers are easy to control since you can turn them off.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44728154)

Favorite

Date: June 23rd, 2022 10:56 AM
Author: Swashbuckling dilemma

so i think it's different because religious predictions of doom are usually accompanied by some sort of salvation in an afterlife. these people just think the world is about to end and there's no hope. i don't think these views have much psychological utility, except maybe that they can feel like they are living in a unique time.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44728178)

Favorite

Date: June 23rd, 2022 11:00 AM
Author: Big sickened site

I don't think that's quite true re salvation and the afterlife. Look at Calvinism. Rather, the unifying theme is that one must give money or status or deference or something else out of value to the preacher, who speaks of supernatural or speculative dangers.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44728203)

Favorite

Date: June 22nd, 2022 10:55 PM
Author: Swashbuckling dilemma

how did he get Uber to buy his company

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726524)

Favorite

Date: June 22nd, 2022 10:57 PM
Author: white sex offender

don't know but that ended as disastrously as it would under the presumption that he is a fraud (got fired after a few months and then they gave up on self-driving and wrote off the entire 'investment'). i would've saved uber $995 million if they'd just written me a $5 million check to advise them the guy was a fraud and had no real tech.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44726542)

Favorite

Date: October 7th, 2022 2:01 PM
Author: Geriatric curious plaza

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45295745)

Favorite

Date: June 23rd, 2022 8:14 AM
Author: Big sickened site

A challenge for AI is that much of human society, and hence our various outputs & training data, are built on lying.

See generally The Elephant in the Brain.

Thus, for example, medicine is not about medicine, by and large, but rather palliating anxiety, which is why medical costs but not the quality of care increases monotonically with wealth.

The law is not about producing just outcomes, but rather about producing determinate outcomes while channelizing unproductive people who can't cooperate to an extractive system.

People create art because they do not stand 6'3".

Etc., etc.

People gain implicit knowledge of such things because they operate as agents with needs, drives, and goals in a world of limited resources, cause and effect, & other agents.

I am very interested to see what emerges when some of the new AI are put into something like the old ABM models but with moar power.

Becoming increasingly plausible, cf. e.g.,
https://iopscience.iop.org/article/10.1088/1741-2552/ac6ca7/meta

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44727515)

Favorite

Date: June 23rd, 2022 8:18 AM
Author: Big sickened site

[also, fairly unrelated & more speculative but check this out -- Freeman Dyson would've been interested I think. I had not appreciated the extent to which quantum theory just is a theory of empiricism and its limits. Peirce was truly way ahead of his time with tychism + objective chance.]

***

Abstract The Free Energy Principle (FEP) states that under suitable conditions of weak coupling, random dynamical systems with sufficient degrees of freedom will behave so as to minimize an upper bound, formalized as a variational free energy, on surprisal (a.k.a., selfinformation). This upper bound can be read as a Bayesian prediction error. Equivalently, its negative is a lower bound on Bayesian model evidence (a.k.a., marginal likelihood). In short, certain random dynamical systems evince a kind of self-evidencing. Here, we reformulate the FEP in the formal setting of spacetime-background free, scale-free quantum information theory. We show how generic quantum systems can be regarded as observers, which with the standard freedom of choice assumption become agents capable of assigning semantics to observational outcomes. We show how such agents minimize Bayesian prediction error in environments characterized by uncertainty, insufficient learning, and quantum contextuality. We show that in its quantum-theoretic formulation, the FEP is asymptotically equivalent to the Principle of Unitarity. Based on these results, we suggest that biological systems employ quantum coherence as a computational resource and – implicitly – as a communication resource.

Indeed while quantum theory was originally developed – and is still widely regarded – as a theory specifically applicable at the atomic scale and below, since the pioneering work of Wheeler [24], Feynman [25], and Deutsch [26], it has, over the past few decades, been reformulated as a scale-free information theory [27, 28, 29, 30, 31, 32] and is increasingly viewed as a theory of the process of observation itself [33, 34, 35, 36, 37, 38]. This newer understanding of quantum theory fits comfortably with the generalization of the FEP, and hence of self-evidencing and active inference, to all “things” as outlined in [10], and with the general view of observation under uncertainty as inference. In what follows, we take the natural next step from [10], formulating the FEP as a generic principle of quantum information theory. We show, in particular, that the FEP emerges naturally in any setting in which an “agent” or “particle” deploys quantum reference frames (QRFs), namely, physical systems that give observational outcomes an operational semantics [39, 40], to identify and characterize the states of other systems in its environment. This reformulation removes two central assumptions of the formulation in terms of random dynamical systems employed in [10]: the assumption of a spacetime embedding (or “background” in quantum-theoretic language) and the assumption of “objective” or observer-independent randomness. It further reveals a deep relationship between the ideas of local ergodicity and system identifiability, and hence the idea of “thingness” highlighted in [10], and the quantum-theoretic idea of separability, i.e., the absence of quantum entanglement, between physical systems.
https://www.sciencedirect.com/science/article/abs/pii/S0079610722000517

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44727524)

Favorite

Date: June 25th, 2022 12:28 PM
Author: Swashbuckling dilemma

this is worth listening to:

https://theinsideview.ai/raphael

good criticism of naive scaling as a solution to general intelligence. this seems better thought through than what I have seen from Marcus

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742063)

Favorite

Date: June 25th, 2022 12:50 PM
Author: Big sickened site

Many thanks! this looks great. I am pasting the transcript below to peruse later (& for ne1 else interested).

One thing I have noticed about GPT-3/raphael is that it is very talented at mimicking "voice" and "tone," but has no conceptual understanding. Very unusual, since usually an adept at aping voice / tone is extremely well-steeped in a genre & understands all of the ins and outs of plot and the micro- and macro system of cause & effect. It's as if it has a higher-order Williams Syndrome.

* * *
TRANSCRIPT OF https://theinsideview.ai/raphael

Raphaël: And I just posted some examples on Twitter of hybrid animals testing Dall-E 2 on combining two different, very different animals together like a hippopotamus octopus or things like that. And it’s very good at combining these concepts together and doing so even in a way that demonstrates some minimal comprehension of a world knowledge in the sense that it combines the concept, not just by haphazardly throwing together features of a hippopotamus and features of an octopus or features of a chair if it is an avocado, but combining them in a plausible way that’s consistent with how chair looks and would behave in the real world or things like that. So those are examples that are mostly semantic composition because it’s mostly about the semantic content of each concept combined together with minimal syntactic structure. The realm in which current text to image generation models seem to struggle more right now is with respect to examples of compositionality that have a more sophisticated syntactic structure. So one good example from the Dall-E 2 paper is prompting a model to generate a red cube on top of a blue cube.

Raphaël: What that example introduces compared to the conceptual blending examples I’ve given is what people call in psychology, variable binding. You need to bind the property of being blue to a cube that’s on top and the property of being red to cube that’s…I think I got it the other way around. So red to the cube that’s on top and blue to the cube that’s at the bottom and a model like Dall-E 2 is not well suited for that kind of thing. And that’s, we could talk about this, but that’s also an artifact of its architecture because it leverages the text encodings of CLIP, which is trained by contrastive learning. And so when it’s training CLIPs it’s only trying to maximize the distance between text image pairs that are not matching and minimize the distance between text image pairs that are matching where the text is the right caption for the image.

Raphaël: And so through that constructive learning procedure, it’s only keeping information about the text that is useful for this kind of task. So it’s kind of, a lot of this can be done without modeling closely the syntactic structure of the prompts or the captions, because unless we adversely designed a new data set for CLIP that would include a lot of unusual compositional examples like a horse riding an astronaut and various examples of blue cubes and red cubes on top of one another. Given the kind of training data that CLIP has, especially for Dall-E 2 stuck [inaudible 02:10:20] and stuff like that, you don’t really need to represent rich compositional information like that to train CLIP, and hence the limitations of Dall-E 2. Imaginen does better at this, because it uses a frozen language model T5-xl, I think, which, we know that language models do capture rich compositional information.

Michaël: Do you know how it uses the language model?

Raphaël: I haven’t done a deep dive into the imaging paper yet, but it’s using a frozen T5 model to encode the prompts. And then it has some kind of other component that translates these prompts into imaging embeddings, and then does some gradient upscaling. So there is some kind of multimodal diffusion model that takes the T5 embedding and is trained to translate that into image embedding space. But I couldn’t, I don’t exactly remember how it does that, but I think the key part here is that the initial text embedding is not the result of constructive learning, unlike the CLIP model that’s used for. Dall-E.

Michaël: Gotcha. Yeah. I agree that…yeah. From the images you have online, you don’t have a bunch of a red cube on top of a blue on top of a green one, and it’s easy to find counter examples that are very far from the training distribution.

Raphaël’s Experience with DALLE-2
Michaël: I’m curious about your experience with Dall-E because you’ve been talking about Dall-E before you had access. And I think in the recent weeks you’ve gained the API access. So have you updated on how good it is or AI progress in general, just from playing with it and being able to see the results from octopus, I don’t know how you call it.

Raphaël: Yeah. I mean, to be honest, I think I had a fairly good idea of what Dall-E could and couldn’t do before I got access to it. And there’s nothing that I generated that kind of made me massively update that prior I had. So again, it’s very good at simple conceptual combination. It can also do fairly well some simple forms of more syntactically structured composition. So if you ask it for, I don’t know, one prompt that I tested it on, that was great. Quite funny is an angry Parisian holding a baguette. So an angry Parisian holding a baguette digital art. Basically every output is spot on. So it’s like a picture of an angry man with a beret holding a baguette, right? So this kind of simple compositional structure is doing really well at it. That’s already very impressive in my book.

Raphaël: So I was pushing back initially against some of the claims from Gary Marcus precisely on that. Around the time of the whole deep learning is hitting a wall stuff. He was emphasizing that deep learning as he would put it fails at compositionality. I think first of all, that’s a vague claim because there are various things that could mean depending on how you understand compositionality and what I spouted out in my reaction to that is that really the claim that is actually warranted by the evidence is that there are failure cases with current deep learning models, with all current deep learning models at parsing compositionally structured inputs. So there are cases in which they fail. That’s true, especially the very convoluted examples that Gary has been testing Dall-E 2 on, like a blue tube on top of a red pyramid next to a green triangle or whatever. When you get to a certain level of complexity, even humans struggle.

Raphaël: If I ask you to draw that, and I didn’t repeat the prompt. I just gave it to you once you probably would make mistakes. The difference is that we humans can go back and look at the prompt and break it down into sub components. And that’s actually something I’m very curious about. I think a low hanging fruit for research on these models would be to do something a little similar to chain of thought prompting, but with text to image models instead of just language models. So with text chain of thought prompting of the Scratchpads paper of language models, you see that you can get remarkable improvements in context learning when you in your few shot examples, you give examples of breaking down the problem into sub steps.

Michaël: Let’s think about this problem step by step.

Raphaël: Yeah, yeah, exactly. And so, well, actually the “Let’s think about this step by step” stuff was slightly debunked in my view, by a blog post that just came out. Who did that? I think someone from MIT. I could send you the link, but someone who tried a whole bunch of different tricks for prompt engineering and found that at least with arithmetic, the only really efficient one is to do careful chain of thoughts prompting, where you really break down each step of the problem. Whereas just appending, let’s think step by step wasn’t really improving the accuracy. So there are some, perhaps some replication concerns with “Let’s think step by step.”

Raphaël: But if you do spell out all the different steps in your examples of the solution, then the model will do better. And I do think that perhaps in the near future, someone might be able to do this with text to image generation where you break down the prompt into first let’s draw a blue triangle, then let’s add a green cube on top of the blue triangle and so on.

Raphaël: And maybe if you can do it this way, you can get around some of the limitations of current models.

Michaël: Isn’t that already something, a feature of the Dall-E API? At least on the blog post, they have something where they have a flamingo that you can add it to be, remove it or move it to the right or left.

Raphaël: Yeah. So you can do in painting and you can gradually iterate, but that’s not something that’s done automatically. What I’m thinking about would be a model that learns to do this similarly to channel thought prompting. So there is a model that just came out a few days ago that I tweeted about that does something a little bit different, but along the same broad lines. So it’s breaking down the prompts, the compositional prompts of the diffusion models into distinct prompts, and then has this compositional diffusion model that has compositional operators like “and” that can generate first embeddings for. For example, if you want a blue cube and a red cube it will generate first embedding for a blue cube and for a red cube. And then it will use a compositional operator to combine these two embeddings together.

Raphaël: So kind of like hard coding into the architecture, compositional operations. And I think my intuition is that this is not the right solution for the long term, because you don’t want, again, the bitter lesson, blah, blah, blah, you don’t want to hard code too much in the architecture of your model. And I think you can learn that stuff with the right architecture. And we see that in language models, for example, you need to hard code any syntactic structure, any knowledge of grammar in language models. So I think you don’t need to do it either for vision language models, but in the short term, it seems to be working better than Dall-E 2 for example, if you do it this way,

Michaël: Right, so you split your sentence with the “and” and then you com combine those embeddings to engineer the image. I think, yeah, as you said, it is probably the general solution is as difficult as solving the understanding of language, because you would need to see in general how in a sentence the different objects relate to each other. And so to split it effectively, it would require a different understanding.

The Future of Image Generation
Michaël: I’m curious, what do you think would be kind of the new innovation? So imagine when we’re in 2024 or even 2023 and Gary Marcus is complaining about something on Twitter. Because for me, Dall-E was not very high resolution, the first one, and then we got Dall-E 2 that couldn’t generate texts or yeah. Do you know faces or maybe that’s something from the API, not very an AI problem, and then Imagine came along and did something much more photorealistic that could generate text.

Michaël: And of course there’s some problems you mentioned, but, do you think in 2023, we would just work on those compositionality problems one by one, and we would get three objects blue on top of red and top of green, or would it be like something very different? Yeah, I guess there are some long tail problems in solving fully the problem of generating images, but I don’t see what it would look like. Would it be just imaging a little bit different or something completely different?

Raphaël: So I think my intuition is that yeah, these models will keep getting better and better at this kind of compositional task. And I think it’s going to happen probably gradually just like language models have been getting better and better at arithmetic first doing two digit operations and then three digit and with Palm, perhaps more than that, or with the right channel of thought prompting more than that, but it still hits a ceiling and you get diminishing returns and that will remain the case. As long as we can’t find a way to basically approximate some form of symbolic-like reasoning in these models with things like variable binding. So I’m very interested in current efforts to augment transformers with things like episodic memory, where you can store things that start looking like variables and do some operations.

Raphaël: And then have it read and write operations. To some extent the work that’s been done by the team at Anthropic led by Chris Olah and with people like [inaudible 02:21:37], which I think is really fantastic is already shedding light on how transformers, they’re just vanilla transformers. In fact, they’re using time models without MLP layers. So just attention-only transformers can have some kind of implicit memory where they can store and retrieve information and do read and write operations in sub spaces of the model. But I think to move beyond the gradual improvement that we’ve seen for tasks such as mathematical reasoning and so on from language models to something that can more reliably and in a way that can generalize better perform these operations for arbitrary digits, for example, we need something that’s probably some form of modification of the architecture that enables more robust forms of variable binding and manipulation in a fully differentiable architecture.

Raphaël: Now, if I knew exactly what form that would take, then I would be funding the next startup that gets $600 million in series B, or maybe I would just open source it. I don’t know, but in any case I would be famous. So I don’t know exactly what form that would take. I know there is a lot of exciting work on somehow augmenting transformers with memory. There’s some stuff from the Schmidt Huber lab recently on fast weight transformers. That looks exciting to me, but I haven’t done a deep dive yet. So I’m expecting a lot of research on that stuff in the coming year. And maybe then we’ll get a discontinuous improvement of text to image models too, where all of a sudden, instead of gradually being able to do three objects, a red cube on top of a blue cube and then four objects, and gradually like that, all of a sudden would get to arbitrary compositions. I’m not excluding.

Conclusion
Michaël: As you said, if you knew what the future would look like, you would be funding as a series B startup in the Silicon valley, not talking on a podcast. Yeah. I think this is an amazing conclusion because it opens a window for what is going to happen next. And, yeah. Thanks for being on the podcast. I hope people will read all your tweets, all the threads on compositionality, Dall-E, GPT-3 because I learned personally a lot from them. Do you want to give a quick shout out to your Twitter account or a website or something?

Raphaël: Sure. You can follow me at, @Raphaelmilliere on Twitter. That’s Raphael with PH the French way. And my last name Milliere, M, I, L, L, I, E, R, E. You can follow my publications on raphaelmilliere.com. And I just want to quickly mention this event that I’m organizing with Gary Marcus at the end of the month, because they might interest some people who enjoy the conversation of compositionality.

Raphaël: So basically I’ve been disagreeing with Gary on Twitter about how extensive the limitations of current models are with respect to compositionality. And there’s something that I really like, a model of collaboration that’s emerged initially from economics, but that’s been applied to other fields in science called adversarial collaboration, which involves collaborating with people you disagree with to try to have productive disagreements and settle things with falsifiable predictions and things like that. So in this spirit of adversarial collaboration, instead of…I think Twitter amplifies disagreements rather than allowing reasonable, productive discussions. I suggested to Gary that we organize together a workshop, inviting a bunch of experts in compositionality and AI to try to work these questions out together. So he was enthusiastic about this and we organized these events online at the end of the month that’s free to attend. You can register compositionalintelligence.github.io.

Raphaël: And yeah, if you’re interested in that stuff, please do join the workshop. It should be fun. And thanks for having me on the podcast. That was a blast.

Michaël: Yeah, sure. I will definitely join. I will add a link below. I can’t wait to see you and Gary disagree on things and make predictions and yeah. See you around.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742150)

Favorite

Date: June 25th, 2022 1:16 PM
Author: stirring brilliant background story

“Gary Marcus doesn’t contribute to the field” is an ad hominem diversion that fails to address the substance of his arguments. He represents the “East Pole” of cognitive science with its focus on modularity and symbolic reasoning, and criticism of AI/ML from an adjacent field is fair game. You will find similar critiques from active and well respected practitioners like Brenden Lake and Joshua Tanenbaum (if not as vocal or adversarial).

In general the field is populated with intelligent people with no philosophical training. Among its brightest lights you will not find anything close to the depth or care of a Fodor or Pylyshyn. They suffer a serious lack of perspective. Francois Chollet is the best among them, and he is a skeptic.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742312)

Favorite

Date: June 25th, 2022 1:24 PM
Author: Big sickened site

(Jerry Fodor).

Jokes aside yes, I think it would be better to respond to the substance of Marcus's arguments; I linked the piece I linked in part because he took pains to quote various active & very well-respected researchers.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742382)

Favorite

Date: June 25th, 2022 1:40 PM
Author: stirring brilliant background story

Ok I read it and I agree with LeCun and his list. The problem with “learning like a baby” is that blank slate-ism isn’t going to work. No DL model will spontaneously learn human language from raw audio data. Babies can do this because they have significant functionality built in.

I also agree strongly with LeCun’s point about interactivity. Real life intelligences act and sense and reason in tightly coupled feedback loops. These are completely missing from most static DL architectures.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742504)

Favorite

Date: June 25th, 2022 1:42 PM
Author: Big sickened site

TY for your response.

But query, what if Chomsky & co. are just wrong with UG and the connectionists and behaviorists [well, as modified into enactivists] were right the whole time? It's an empirical question.

{I have a whole intricate view on all of this, developed largely independently of the ML/AI foofaraw, that I'm too tired to expand on now.}

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742515)

Favorite

Date: June 25th, 2022 1:43 PM
Author: claret rehab

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742521)

Favorite

Date: June 25th, 2022 1:49 PM
Author: stirring brilliant background story

Yes ultimately it’s empirical but I think the weight of evidence so far strongly suggests innatism. Babies display all kinds of biases that hint at underlying structures, and the fact that specific brain regions (eg Broca’s area) are responsible for specific functions does as well.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742549)

Favorite

Date: June 25th, 2022 2:16 PM
Author: Big sickened site

There's a lot of interesting work on this that goes far beyond the old poverty-of-stimulus debates and into self-evidencing systems & the acquisition of concepts; I can without getting into the weeds give a sense of my views with some cites:

Friston et. al., Generative models, linguistic communication and active inference (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7758713/)
>>>> ("Note that the mathematical formulation used here—which is described in detail in the sections that follow—differs from previous approaches in this literature. There are two key points to note here. First, the current formulation considers the uncertainty of the agent’s beliefs about the scene at hand. Second, we introduce an active component—which generates predictions about the information that an agent will seek to resolve their uncertainty. In other words: What questions should I ask next, to resolve my uncertainty about the subject of our conversation?")
>>>>

https://www.semanticscholar.org/paper/Evaluating-the-Apperception-Engine-Evans-Hern%C3%A1ndez-Orallo/8085e0b4900fc83976632a68c6db510c7b0dca81 (described in
https://www.degruyter.com/document/doi/10.1515/9783110706611/pdf#page=11 ("Delving into the rich details of this implementation is beyond the scope of this review and must be left for another occasion, but the readers may consult Evans’ contribution to this volume themselves (Evans 2022, this volume, ch. 2). This requirement explicitly exceeds the typical empiricist approaches that are purely data-driven, as criticized by the 'pessimists' such as Pearl, Mitchell and Marcus & Davis (see above). But it does not mean that Evans thereby takes his Apperception Engine to constitute a nativist system, as demanded by Marcus and Davis. With respect to the debate between optimists and pessimists, Evans objects to Marcus’ interpretation of Kant as a nativist, because it is important what is taken to be innate. That is, it makes a difference whether one claims that concepts are innate or faculties (capacities) whose application produces such concepts. Kant allegedly did not conceive of the categories as innate concepts: “The pure unary concepts are not ‘baked in’ as primitive unary predicates in the language of thought. The only things that are baked in are the fundamental capacities (sensibility, imagination, power of judgement, and the capacity to judge) [. . .]. The categories themselves are acquired – derived from the pure relations in concreto when making sense of a particular sensory sequence” (Evans 2022, this volume, p. 74). Evans follows Longuenesse (2001), who grounds her interpretation in a letter Kant wrote to his contemporary Eberhard; in it, he distinguishes an “empirical acquisition” from an 'original acquisition', the latter applying to the forms of intuition and to the categories. Evans is right in saying that, as far as the cognition of an object is concerned – like the 'I think' – the categories come into play only by being actively (spontaneously) applied through the understanding, and can thus be derived, if you will, through a process of reverse engineering which reveals that they have to be presupposed in the first place, being a transcendental condition of experience. But this is compatible with the claim that, given their a priori status (and given that they can be applied also in the absence of sensory input, though not to yield cognition in the narrow sense but still cognition in the broad sense, as characterized above), 'they have their ground in an a priori (intellectual, spontaneous) capacity of the mind' (Longuenesse 2001, p. 253)"))

>>>>

A. Pietarinen, Active Inference and Abduction https://link.springer.com/article/10.1007/s12304-021-09432-0 (drawing link between late-Peircean account of his semiotics & abduction & current work in active inference & the Bayesian brain).

& cf. https://www.academia.edu/24037739/Pragmatism_and_the_Pragmatic_Turn_in_Cognitive_Science (placing Fristonian self-evidencing & active inference in Peircean pragmaticst context.)

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44742701)

Favorite

Date: June 25th, 2022 3:23 PM
Author: stirring brilliant background story

Yes, Fristonian “predictive processing” would impose structure by balancing accuracy with simplicity, yielding what LeCun calls a “regularized latent variable” model. Active inference would resolve the ambiguities in, say, the inverse optics problem. Bengio frames this as an optimal sampling problem and addresses it with his GFlowNets.

Good! But I don’t think this explains the fixity of neuroanatomy or such specialized (and localized) functions as facial recognition. Infant studies and lesion studies suggest it is not learned so much as developed.

There are other mysteries besides… The analog, oscillatory, synchronized aspects of the brain seem to have been completely ignored since the advent of cybernetics.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44743016)

Favorite

Date: June 25th, 2022 4:18 PM
Author: Swashbuckling dilemma

How much of the modularity and fixed structure is the result of where sensory information comes into the brain? The rewiring experiments strongly suggest in the neocortex that other parts are able to take over and learn the functions of the specialized regions if sensory information is redirected. This doesn't seem to fit with the view there are a lot of valuable priors coded into specific regions from the genome.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44743310)

Favorite

Date: June 25th, 2022 4:57 PM
Author: Big sickened site

Re: the fixity of neuroanatomy, the data are (even now) frustratingly mixed. IMO, the more exciting work is happening at the Marr - computational / algorithmic level as of now when it comes to complex tasks. And this remains true even if the work doesn't really translate to understanding how the human brain functions (e.g., https://psyarxiv.com/5zf4s/ (critiquing use of DNN's as model of human vision system)) since engineering success is cool regardless of what it says about you.

[IT seems like some sort of 'soft connectionist' paradigm is probably cr, in which, due to architectures being re-used for energy efficiency & certain patterned problems having certain optimal solutions that are converged on, including with respect to architecture -- but the empirics seem to me to need time to season.

It has been incredibly frustrating learning all this junk in the 90s and 00's only to later discover that, say, tons of fMRI studies are garbage; or adult neurogenesis is not really a thing, etc. {Maybe it is time to go back to lesioning people. At least that's a real experimental intervention amirite! Cf. S. Siddiqi et al., Causal Mapping of Human Brain Function, Nature Review of Neuroscience April 20, 2022 (https://www.nature.com/articles/s41583-022-00583-8)}.

Even so, as you reference, emerging work on diff. types of brain operation--synchrony, analog computing, etc.--or just the results of new probes (optogenetics, viral labeling techniques like [https://www.nature.com/articles/s41593-022-01014-8]) promise to deliver much interesting info in years to come.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44743471)

Favorite

Date: June 26th, 2022 12:53 AM
Author: Razzle space

"The problem with “learning like a baby” is that blank slate-ism isn’t going to work. No DL model will spontaneously learn human language from raw audio data. Babies can do this because they have significant functionality built in. "

even now (when they are clearly way smaller and undertrained compared to what's possible), transformers seem to understand an awful lot about human language. there's clearly a ways to go, but i wouldn't be confident at all in this assertion. remember that whatever is innately programmed can also be learned directly from data. learning from raw data decreases the efficiency, but who cares about efficiency when you can feed in millions of hours of youtube videos, billions of tweets, every book, etc? the constraints are nothing like a human during normal development.

if we had reason to believe the normal human mind is a very unique program and only a few learning systems converge on it, this might not work. it's really hard for to buy into that notion given how versatile simple techniques are and how they rather consistently show human or superhuman performance in many domains when trained at scale. i think we would be seeing a lot more need for highly tailored architectures for specific problems at this point if that were actually true.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44745948)

Favorite

Date: June 26th, 2022 10:45 AM
Author: Big sickened site

I'm sympathetic to this view, but see https://psyarxiv.com/5zf4s/ (pasted above) for a good discussion re: showing "human" performance (tied to vision / visual perception, which we understand pretty well at this point).

AI may be showing "human-like" or "human-level" performance on tasks, but don't seem to be implementing the _same_ algorithms that provide human affordances. This may or may not be important, depending on your area of interest.

That is why I think it is important to separate excitement about AI qua engineering feat and AI as a tool for cognitive psych.

I'll try to come back to this l8r today, I've more to say but duty calls.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44746815)

Favorite

Date: June 26th, 2022 11:48 AM
Author: Swashbuckling dilemma

it would be interesting to know whether they are looking at model features of convolutional neural networks only or also vision transformers. vision transformers are more resistant to things like adversarial examples and produce generally better performance, so i would expect them to have fewer failure modes like what are described in the paper.

i think there are a couple things going on here. neural networks are as good as they need to be for a particular task. if looking only at local features or texture is able to produce high performance on a particular task (or are the easiest parts of the loss landscape to navigate down on), that's what they'll learn first. given sufficient scale and continued training, eventually they'll have to learn more global features and be more resistant to perturbations of the input. it's just for most image benchmark tasks learning things like that are probably not very beneficial for reducing loss.

the other part of this is that almost all these models are just feedforward networks designed to emulate fast human vision. people might have an initial response that's similar to a neural network, but they are also able to think about a stimulus and refine a classification. there are recurrent circuits in the brain. these don't fit in the standard feedforward model and i expect will put constraints on their performance. i think these deficiencies could be corrected, but everyone is ok for now with the reasonably good performance of models like ViT which is already better than humans in certain contexts.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44747055)

Favorite

Date: June 26th, 2022 3:50 PM
Author: stirring brilliant background story

>transformers seem to understand an awful lot about human language

They can master the syntax, sure, and make predictions about missing words from context. But there is no semantic grounding that tethers words to reality. For example, the "sentient" Google AI wrote of "spending time with friends and family" that don't exist.

Moreover words are discretely represented (as "tokens") in LLMs. This itself encodes significant abstract, human-curated information. Very different from learning from raw audio streams without human-curated labels.

>if we had reason to believe the normal human mind is a very unique program

We have reasons to believe that various human cognitive and perceptual systems are unique, modular programs because we can tease out their quirks and regularities from careful experimentation and even selectively disable certain functionality (like short-term memory, speech, and face recognition).

>who cares about efficiency when you can feed in millions of hours of youtube videos

Certainly you can train impressive models to do impressive things with this data. What you cannot do, with today's deep learning, is to train an intelligent agent that replicates the basic functionality of a Star Wars droid:

1) model and navigate new environments (like the real world)
2) formulate and pursue goals
3) model other agents and their internal states
4) communicate with other agents about real things really happening in the real world

A simple "butler" droid that brings you a bottle of beer or fetches the newspaper does not currently exist. Why not? A "baby" droid that spontaneously develops sight, hearing, movement, and language from raw experience in real time does not currently exist. Why not? Babies can do this -- why not bots?

The point is that something is missing, and I don't think you can get AGI / HLAI without identifying and filling in these gaps.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44748428)

Favorite

Date: June 26th, 2022 3:51 PM
Author: Big sickened site

"3) model other agents and their internal states"

I suspect that this is the big one.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44748434)

Favorite

Date: June 26th, 2022 4:00 PM
Author: stirring brilliant background story

https://youtu.be/q9XR9Wbl7iI?t=870

Watch till minute 17. (Timestamped at 14:30)

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44748475)

Favorite

Date: June 26th, 2022 5:17 PM
Author: Swashbuckling dilemma

Language models don't see words (at least GPT-3 doesn't and the ones I know about). GPT-3 uses byte level encoding, so it sees character groupings. it has to learn the construction of words from BPEs. i should note that this is actually a major problem for certain tasks and forces the model to memorize more things than it should have to because it's not able to map functionally identical inputs to the same representation. it's remarkable it's able to learn things like arithmetic as well as it does despite being totally crippled by the encoding scheme.

https://nostalgebraist.tumblr.com/post/620663843893493761/bpe-blues

i think the modularity of human brains, at least in the neocortex, is produced by a couple things. certain types of sensory input flow into certain areas and the brain uses sparse activation patterns (rather than the sort of dense activations that GPT-3 use, where the entire network processes every input). only a small part of the brain is activated by certain inputs because the brain has learned to construct a modular architecture based on its past history. networks that are better able to process an input are activated in response to it.

this is different from saying the modular architecture is programmed. the neocortex looks rather uniform anatomically, and functionally different areas are able to take over and process other types of data.

https://www.lesswrong.com/posts/9Yc7Pp7szcjPgPsjf/the-brain-as-a-universal-learning-machine#Dynamic_Rewiring

i think the future of these models is likely to be more brain like because training brain sized dense models is needlessly expensive, but we aren't likely to see human engineering of different modules. sparsity will produce modularity organically.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44748832)

Favorite

Date: June 30th, 2022 10:51 PM
Author: Swashbuckling dilemma

https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html

and this is basically why i don't think we need a new paradigm. connectionist models are perfectly capable of learning symbolic reasoning. worth noting that this seems to be a lot earlier than anticipated (2025!). how worthless are AI capability predictions?

https://twitter.com/Yuhu_ai_/status/1542560073838252032

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44776256)

Favorite

Date: July 1st, 2022 7:00 AM
Author: Big sickened site

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44777497)

Favorite

Date: July 1st, 2022 11:45 AM
Author: stirring brilliant background story

It's not just about model architecture but the entire framework of offline supervised learning. Yes if you curate a huge dataset that was generated by humans, you can train cool models to do whatever was in the training set, including symbol manipulation.

But can you get Data from Star Trek? I don't think so. However Data learns, and however he solves equations through symbol manipulation, I don't think it's the same way deep learning does it. Data could explain why he made each step.

It's a qualitatively different kind of thing.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44778478)

Favorite

Date: July 1st, 2022 12:55 PM
Author: Swashbuckling dilemma

i think you are right in a certain sense. one of the weird things about this model is that it does well on university and high school level problems while only being slightly better on middle school problems. whatever this is doing, it's clearly different from the way a human learns with a logical capability progression. it's sort of like how language models are in general - idiot savants where they have striking capabilities while having bizarre misunderstandings.

the key question is whether these misunderstandings will go away with better scale and better training regimes. i think they will. the thing is this is really unpredictable. one thing that people have noted with these large models is that there are certain capabilities that progress smoothly with parameter count and training size, and then there are capabilities that spike suddenly during training. it seems like initially they memorize certain low level patterns that work well for most problem classes (while still having striking deficiencies and not "seeing" the big picture), until eventually they "grok" the simple program/function that provides complete generalization.

https://www.lesswrong.com/posts/JFibrXBewkSDmixuo/hypothesis-gradient-descent-prefers-general-circuits

human brains might have better priors encoded in them that provide this sort of generalization without extensive training, or maybe it's the fact they are pre-trained on loads of audio/video data that allows them to generalize quicker. even if it's the good prior model, i think it's unlikely that doesn't go away at large scale.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44778912)

Favorite

Date: July 1st, 2022 1:56 PM
Author: Big sickened site

Humans can generate their own training data since they can first figure out self/other boundaries & then conduct 'experiments'.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44779241)

Favorite

Date: July 1st, 2022 2:25 PM
Author: Swashbuckling dilemma

LMs can do that as well (someone at Google Brain or DeepMind or OpenAI just has to get around to it). MuZero is a general framework. right now they are just using it for games, but it will work with language. i think this is probably the best way to correct a lot of their deficiencies. right now GPT-3 just sequentially samples from a token distribution at every point in a sample, but it can't look into the future to see how a sentence or paragraph will go. if it picks a bad token at some point, it just has to make do with it and try to generate the rest of the sample as best as it can.

with something like AlphaZero, it's basically a tree search based on the probability distribution at every point in time. even if the network predicts a move that is ultimately bad, it can see through that by continuing down the simulated move tree and seeing the implications of it. once someone gets around to doing that with language, i anticipate very large performance gains.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44779426)

Favorite

Date: July 1st, 2022 2:30 PM
Author: Big sickened site

We need to hook these things up to robot bodies and let do Imo.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44779457)

Favorite

Date: July 2nd, 2022 1:24 PM
Author: stirring brilliant background story

The telos of language isn’t to predict or generate language but to communicate information about self and world to others.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784055)

Favorite

Date: July 2nd, 2022 1:57 PM
Author: Swashbuckling dilemma

i'm not sure what point this is trying to make. a language model trained on only text is going to have a limited model of the world, but there is clearly a decent correspondence between language and the actual world. LMs can do some sort of common sense reasoning and draw implications based on this world model. it's imperfect and there are clearly deficiencies that will only be corrected by learning other forms of data, but that doesn't seem really insurmountable. look at something like deepmind's flamingo.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784192)

Favorite

Date: July 2nd, 2022 2:14 PM
Author: stirring brilliant background story

I don’t see a world model in LMs, only linguistic correlations. I guess it depends what you mean by world model.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784255)

Favorite

Date: July 2nd, 2022 2:24 PM
Author: Swashbuckling dilemma

i think knowing about the properties of things and how they relate to other things is the essence of a world model. GPT-3 has this, however imperfect it is. even without linking the representations to aspects of the physical world, it has a sort of understanding of how the world functions. linking them is the next logical step, and i don't think it should be that hard. look at page 32 of the flamingo paper:

https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/tackling-multiple-tasks-with-a-single-visual-language-model/flamingo.pdf

does that seem like something that has a world model to you?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784310)

Favorite

Date: July 2nd, 2022 2:48 PM
Author: stirring brilliant background story

Yes it seems like something that has a world model.

I guess my interest is different. I want to see an agent that can navigate and understand new environments — learning, conceptualizing, and generalizing in real time. That could be an embodied 3D agent or an internet bot.

This is the difference between LLMs and all intelligent life forms. Flamingo is relying on millions of associations that humans baked into the data. It is aping human behavior instead of dynamically updating its own model and expressing it through language.

Look at the example on the bottom left of page 32. Through prompt engineering it concludes that the monster soup is made of woollen fabric. If they run the query over again, is Flamingo going to remember this fact? No. To do that you need memory and variable binding.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784379)

Favorite

Date: July 2nd, 2022 4:07 PM
Author: Swashbuckling dilemma

ok, i can understand that. right now there are separate research paths. reinforcement learning, which is basically what you are describing, is distinct from language model research. this has come along way as well. at least in toy environments like Atari or board games, they are able to get superhuman performance with agents learning by themselves how to achieve things.

i think of language models as evidence that these models can learn the structure of human thought, but they won't be agents by themselves. these two research directions will probably merge in the near future. RL suffers from data efficiency problems so it doesn't work for most real world applications. language models and their multimodal successors can eliminate this, by acting as the world model part which agents can use for simulation to bootstrap themselves to higher capability levels. think of something like efficientzero:

https://www.lesswrong.com/posts/mRwJce3npmzbKfxws/efficientzero-how-it-works

it can learn a model of the atari environment in a few interactions and then use it for mental simulation. it can then learn as efficiently as a human starting from no prior knowledge. this will be the role of GPT successors, acting as a way to increase data efficiency for agents. the MuZero LM idea i described above is just a starting point for something more general.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784698)

Favorite

Date: July 2nd, 2022 1:21 PM
Author: Big sickened site

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784044)

Favorite

Date: July 2nd, 2022 1:33 PM
Author: Swashbuckling dilemma

i saw this recent post on Marcus and think it's about right:

https://www.reddit.com/r/TheMotte/comments/v8yyv6/comment/ieixnwm/?utm_source=share&utm_medium=web2x&context=3

"He's worse than the Bourbons - not only has he not learned anything, he's forgotten an awful lot along the way too. He's been moving the goalposts and shamelessly omitting anything contrary to his case, always. Look at his last Technology Review essay where he talks about how DALL-E and Imagen can't generate images of horses riding astronauts and this demonstrates their fundamental limits - he was sent images and prompts of that for both models before that essay was even posted! And he posted it anyway with the claim completely unqualified and intact! And no one brings it up but me. He's always been like this. Always. And he never suffers any kind of penalty in the media for this, and everyone just forgets about the last time, and moves on to the next thing. "Gosh, Marcus wasn't given access? Gee, maybe he has a point, what's OA trying to hide?""

there should be a better critic of the scaling hypothesis than him. he's obviously intellectually dishonest.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784091)

Favorite

Date: July 2nd, 2022 2:18 PM
Author: Big sickened site

Ty. Intellectual honesty of this sort is unforgivable, in a field where credibility is so important.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784273)

Favorite

Date: July 2nd, 2022 2:25 PM
Author: stirring brilliant background story

I don’t personally like Marcus thanks to his basic bitch shitlibism. But he could be a child molester and it wouldn’t make his core arguments — which haven’t really changed for 20 years — any less correct.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784313)

Favorite

Date: July 2nd, 2022 2:43 PM
Author: Swashbuckling dilemma

that's the problem. i can't see what his core argument is.

i mean i understand the bit about the need for neural symbolic systems, but i would like him to define tasks or problems he would expect GPT-N to consistently fail on. basically a well-defined empirical test for the inherent limitations of DL. if people are able to solve problems he demonstrates with either a new GPT model or different prompt settings or using BO=20, that strongly suggests to me he is just doing improvised issue finding and doesn't have a good model in his head of inherent DL failure modes.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44784365)

Favorite

Date: July 3rd, 2022 3:49 PM
Author: Swashbuckling dilemma

https://www.youtube.com/watch?v=Gfr50f6ZBvo

he mentions it briefly, but DeepMind is apparently massively scaling up GATO. i think we should know soon enough whether scale is all you need.

not related to scaling, but the part about building a virtual model of a cell is interesting. i often wondered whether that would be possible given how expensive and unreliable laboratory science often is. seems like in silicon drug design could finally become viable in a massive way.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44789315)

Favorite

Date: July 3rd, 2022 5:57 PM
Author: Big sickened site

Very exciting on both counts.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44789755)

Favorite

Date: August 3rd, 2022 11:32 PM
Author: Swashbuckling dilemma

https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications

worthwhile read. so language models are rapidly approaching their limits right now if you take Chinchilla seriously, at least with the current training scheme. maybe Google could train their model on all of Gmail or whatever, but i have to ask wtf we are doing if it takes tens of trillions of language tokens to get human level language generation. i think a big part of it is the training objective is bad (see the comments about small GPT-2 already surpassing humans at token prediction).

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44960606)

Favorite

Date: August 4th, 2022 7:14 AM
Author: Big sickened site

Ty. This would seem to validate Judea Pearl's point about correlational data.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44961589)

Favorite

Date: August 4th, 2022 12:41 PM
Author: Swashbuckling dilemma

there's still a large potential to remedy those problems with video. there is a ridiculous amount of data available here for learning mappings from language to the real world.

and although it's clear to me that loss reductions from the standard type of language model training are likely to be small going forward, it's not clear to me how significant that is. the last bits correspond to complex semantic meaning. even a small reduction in absolute prediction loss could produce large capability increases.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44962660)

Favorite

Date: August 4th, 2022 12:57 PM
Author: Big sickened site

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44962737)

Favorite

Date: August 4th, 2022 12:59 PM
Author: stirring brilliant background story

Video is better than static images but it's still passive. If you want causality you need interaction, you need hypothesis testing, you need to be able to sample the input space in a way that minimizes uncertainty.

Look at the inverse projection problem:

https://o.quizlet.com/fyEr4Vl9d1rFtb7DiyfXgg.png

There are infinitely many quadrilaterals that map onto the same apparent image as perceived by the lens or retina. How to know the right one? A simple eye saccade, or a tilt of the head, would do. Two or more measurements of the same object allow for triangulation and disambiguation. From very little data, therefore, we can form highly accurate models.

When in conversation your interlocutor says something that is unclear, you can ask him to elaborate. You can intervene in the system and query it to resolve doubt. LLMs cannot do this. They rely on strictly passive, feedforward, one-way associative reasoning. There is no feedback, no updating, no hypothesis revision on the basis of actively querying the input space.

Each neocortical column is a sensorimotor apparatus. Intelligence requires action.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44962743)

Favorite

Date: August 4th, 2022 1:01 PM
Author: Big sickened site

That is why we need to just unleash bots equipped with basic "drives" into the world, all set to generate their own hq training data.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44962752)

Favorite

Date: August 4th, 2022 1:27 PM
Author: stirring brilliant background story

Or unleash them in simulated environments like Minecraft.

The "training" paradigm is missing something, though. It says, in effect, "Here's a bunch of data, have at it, learn optimal representations for some task."

But we don't *want* a bunch of data. We want only as much as necessary. We want to sample just that subset of the data that reveals what we're missing.

Of course, you can probably simulate this objective in the DL paradigm through iterative improvement, stair-stepping with continually refined training sets, as in reinforcement learning.

This is acceptable from an engineering perspective, because we can do something that works or appears to work. But it is unsatisfactory from a scientific perspective that seeks to understand how organic intelligence works, since everything is happening in real time, there is no global loss to backpropagate over, etc.

There is a difference between approximating a program over a finite set of inputs, and *discovering* a program that is defined over an infinite set of inputs. Machine learning is concerned with the former, science is concerned with the latter. It is the difference between Shannon entropy and Kolmogorov complexity.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44962859)

Favorite

Date: August 4th, 2022 1:30 PM
Author: Big sickened site

Hmmm. Can you expand on what you mean by "infinite" set of inputs?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44962876)

Favorite

Date: August 4th, 2022 2:31 PM
Author: stirring brilliant background story

In the theory of automata and formal languages you have a few basic kinds of things. One kind of thing is a string, a sequence of characters drawn from an alphabet. For example, "aabbaab" is a string of length 7 drawn from the alphabet {'a', 'b'}. There are 2^7 possible such strings.

We can characterize subsets of strings as being generated by a "grammar" -- another kind of thing in automata theory. The set of all strings generated by a particular grammar is called a "language". For example, the string "aabbaab" belongs to the language of all strings that start with "aa". This is an infinite set, but it can be compressed into a finite expression like "aa...".

For every grammar there exists at least one machine that accepts the language defined by the grammar. That is, while we can use a finite set of rules to *generate* strings, we can use a different set of rules or operations to *recognize* strings as belonging, or not belonging, to the language.

This is how early vending machines worked. Inside them was a circuit that recognized whether a sequence of coins belongs to a language like "adds up to $1.25".

Similarly, if we want to know if a given string belongs to the "aa..." language we can design a machine that will always give us the correct answer, regardless of the length of the string. Correctness is guaranteed over a set of infinite inputs, while the size of the machine remains fixed.

Machine learning is concerned with the inverse problem: given a set of input / output pairs, approximate the grammar that generated them. The input in our case would be a string, and the output a binary Yes or No.

But the training set of input / output pairs will always be finite and the best we can do is compress them according to observed statistics. There is no guarantee of correctness for unobserved inputs -- let alone **every possible input** -- and better approximations will require ever larger datasets and ever larger models. Exponential scaling is bad.

This is why Teslas keep crashing. They cannot account for an infinite set of inputs.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44963196)

Favorite

Date: August 4th, 2022 2:32 PM
Author: Big sickened site

I see. Ty.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44963200)

Favorite

Date: January 29th, 2023 1:47 PM
Author: Swashbuckling dilemma

what do you think about this research? i thought it was interesting:

https://www.reddit.com/r/MachineLearning/comments/10ja0gg/r_deepmind_neural_networks_and_the_chomsky/

certain architectures, notably transformers, don't do well with this sort of grammar task. it looks like memory augmented RNNs can learn how to do it. i think this is pretty different from the logic-DNN combination some people are envisioning as a solution for this sort of thing.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45858696)

Favorite

Date: August 4th, 2022 2:32 PM
Author: Swashbuckling dilemma

i agree that allowing the model to interact with the real world is optimal for learning new things. ideally it should be an agent that can deploy exploration strategies to reduce model uncertainty.

i still think you can go very far through pure unsupervised learning. causality can be inferred from observation. is pressing a button yourself and observing some outcome fundamentally different from observing someone else press the same button and seeing the same outcome? it's still just an inference. ultimately you want an agent that can take actions in the real world to accomplish goals, but you can view unsupervised learning on video or other modalities as a sort of pre-training for an agent. considering the vast amount of video out there, the fact the model can't do active exploration on it (by interacting with the environment like you describe) doesn't seem that important. imagine something like this with a broader training:

https://openai.com/blog/vpt/

you are basically just trying to create a highly accurate model that can be plugged into an agent for rapid learning.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#44963199)

Favorite

Date: August 12th, 2022 5:44 PM
Author: Swashbuckling dilemma

more discussion about this:

https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-next

interesting stuff. either humans are inherently bad at text prediction and do something else or they are good at it but it just takes a lot to train them to be competent at this particular formulation. the first scenario is pretty scary to me and doesn't seem implausible at all. maybe all we need to do is change the objective function and LMs will suddenly get much better than humans at language generation. who knows what a LM with a quality evaluation metric and the ability to go back and rewrite past output could do?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45005774)

Favorite

Date: August 12th, 2022 7:16 PM
Author: Big sickened site

Ty, I'll review with interest.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45006331)

Favorite

Date: August 20th, 2022 6:12 PM
Author: Razzle space

Based on that game, it seems like there are statistical properties of language that humans don’t model. I don’t think i could ever beat that network even with a lot of practice

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45044815)

Favorite

Date: September 2nd, 2022 9:24 AM
Author: Big sickened site

Google Is Making LaMDA Available

https://thealgorithmicbridge.substack.com/p/google-is-making-lamda-available

Google is making LaMDA available
The company plans to release LaMDA through AI Test Kitchen ( https://aitestkitchen.withgoogle.com/ ), a website intended for people to “learn about, experience, and give feedback” on the model (and possibly others, like PaLM, down the line). They acknowledge LaMDA isn’t ready to be deployed in the world and want to improve it through real people’s feedback. AI Test Kitchen isn’t as open as GPT-3’s playground — Google has set up three themed demos, but will hopefully lift the constraints once they positively reassess LaMDA’s readiness for open conversation.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45108939)

Favorite

Date: September 5th, 2022 7:14 PM
Author: Swashbuckling dilemma

really feel like these chatbots released by Google and Facebook will be useless. too sanitized. you could never ask them to generate a story about sex with reptiles. what's the point?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45123527)

Favorite

Date: September 6th, 2022 10:13 AM
Author: Big sickened site

There will be open source free versions just as good or close to it for all of them.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45125712)

Favorite

Date: September 6th, 2022 10:14 AM
Author: Big sickened site

https://www.deepmind.com/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules

>>>
Generalising to unknown models
The ability to plan is an important part of human intelligence, allowing us to solve problems and make decisions about the future. For example, if we see dark clouds forming, we might predict it will rain and decide to take an umbrella with us before we venture out. Humans learn this ability quickly and can generalise to new scenarios, a trait we would also like our algorithms to have.

Researchers have tried to tackle this major challenge in AI by using two main approaches: lookahead search or model-based planning.

Systems that use lookahead search, such as AlphaZero, have achieved remarkable success in classic games such as checkers, chess and poker, but rely on being given knowledge of their environment’s dynamics, such as the rules of the game or an accurate simulator. This makes it difficult to apply them to messy real world problems, which are typically complex and hard to distill into simple rules.

Model-based systems aim to address this issue by learning an accurate model of an environment’s dynamics, and then using it to plan. However, the complexity of modelling every aspect of an environment has meant these algorithms are unable to compete in visually rich domains, such as Atari. Until now, the best results on Atari are from model-free systems, such as DQN, R2D2 and Agent57. As the name suggests, model-free algorithms do not use a learned model and instead estimate what is the best action to take next.

MuZero uses a different approach to overcome the limitations of previous approaches. Instead of trying to model the entire environment, MuZero just models aspects that are important to the agent’s decision-making process. After all, knowing an umbrella will keep you dry is more useful to know than modelling the pattern of raindrops in the air.

Specifically, MuZero models three elements of the environment that are critical to planning:

The value: how good is the current position?
The policy: which action is the best to take?
The reward: how good was the last action?
These are all learned using a deep neural network and are all that is needed for MuZero to understand what happens when it takes a certain action and to plan accordingly.

>>>

Maybe this is why people ruminate.

https://www.deepmind.com/blog/muzeros-first-step-from-research-into-the-real-world

By learning the dynamics of video encoding and determining how best to allocate bits, our MuZero Rate-Controller (MuZero-RC) is able to reduce bitrate without quality degradation. QP selection is just one of numerous encoding decisions in the encoding process. While decades of research and engineering have resulted in efficient algorithms, we envision a single algorithm that can automatically learn to make these encoding decisions to obtain the optimal rate-distortion tradeoff.

Beyond video compression, this first step in applying MuZero beyond research environments serves as an example of how our RL agents can solve real-world problems. By creating agents equipped with a range of new abilities to improve products across domains, we can help various computer systems become faster, less intensive, and more automated. Our long-term vision is to develop a single algorithm capable of optimising thousands of real-world systems across a variety of domains.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45125713)

Favorite

Date: September 6th, 2022 8:03 PM
Author: Swashbuckling dilemma

did you see EfficientZero? it's similar to MuZero but achieved better Atari performance. what's remarkable is that it achieves human-level sample efficiency. it can learn how to beat the median human at Atari games after just playing for two hours. this is a little bit misleading since it learns a model during its play time and then basically plays the game in its head (so it spends a ton of time playing with its learned model, rather than the actual game). regardless, it's really important if you think sample efficiency is what's preventing DRL from having large impacts in the world.

https://www.lesswrong.com/posts/jYNT3Qihn2aAYaaPb/efficientzero-human-ale-sample-efficiency-w-muzero-self

the concerning thing here is that it's as good as it is and there's no implemented transfer learning. it's learning from a blank slate in those 2 hours, which certainly cripples efficiency. if you start off a game with human type priors (this is an object, it moves in certain ways, if i take certain actions it's likely to respond in such a way, etc), you have a strong advantage to a system like this. imagine trying to get a human to learn how to play an atari game in 2 hours of play time with no priors for how the world works

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45128554)

Favorite

Date: September 6th, 2022 8:06 PM
Author: Big sickened site

I had but hadn't seen the gwern writeup, thanks!

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45128571)

Favorite

Date: September 6th, 2022 8:19 PM
Author: Swashbuckling dilemma

DRL seems like it's in the same situation as language models. already pretty impressive and close to being useful, with still a lot of unexplored avenues for improvement.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45128644)

Favorite

Date: September 6th, 2022 8:20 PM
Author: Big sickened site

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45128646)

Favorite

Date: September 14th, 2022 2:43 PM
Author: Big sickened site

Nice GM piece on compositionality.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45170390)

Favorite

Date: September 14th, 2022 10:58 PM
Author: Swashbuckling dilemma

https://astralcodexten.substack.com/p/i-won-my-three-year-ai-progress-bet

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45172889)

Favorite

Date: September 15th, 2022 4:32 AM
Author: Big sickened site

https://garymarcus.substack.com/p/did-googleai-just-snooker-one-of

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45173798)

Favorite

Date: September 15th, 2022 5:26 PM
Author: Swashbuckling dilemma

At this point, i'll admit i have a bad attitude about this guy given his past distortions and goalpost moving. My prior now is that whatever he is saying is probably a waste of time.

Given that these models were totally garbage less than a year ago, I'll withhold judgment on what they can't do. I'll be shocked if he's making this same argument in a couple years.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45177046)

Favorite

Date: September 15th, 2022 5:28 PM
Author: Big sickened site

I think he had the better of this exchange.Scott is getting ahead of himself.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45177062)

Favorite

Date: December 16th, 2022 10:11 PM
Author: Swashbuckling dilemma

https://deepimagination.cc/eDiff-I/

Would be interesting to see this experiment done with this new model. Based on the prompts, it seems like it's able to combine components from verbal descriptions noticeably better.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45645265)

Favorite

Date: September 14th, 2022 3:38 PM
Author: Big sickened site

John Carmack https://www.reddit.com/r/slatestarcodex/comments/wq6wif/john_carmack_just_got_investment_to_build_agi_he/

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45170650)

Favorite

Date: September 14th, 2022 10:50 PM
Author: Swashbuckling dilemma

his interview with Lex Fridman is interesting but weird. 50% probability of AGI by 2030 seems entirely reasonable. i don't understand the confidence in a slow takeoff though. they will probably need hundreds or thousands of GPUs because it's not possible to store the weights for one mind in a single GPU, agreed. that doesn't mean training will take a long time and that you couldn't greatly scale a mind in comparison to a human by just increasing GPU count. normal human development is very unstructured with lots of time spent sleeping or doing pointless things that don't facilitate learning. if you are just training on information that enhances an agent's capability and your GPU count is limited by the budget of a tech giant, i don't think it would have to take that long at all. the transition from retarded toddler phase to something strong superhuman over many tasks could plausibly be very short.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45172829)

Favorite

Date: September 17th, 2022 9:46 PM
Author: Swashbuckling dilemma

scale coming for the typical office job:

https://www.adept.ai/act

broad integration of systems like this would help considerably with data constraints. not to mention automation of general work provides even more funding for large training runs.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45188393)

Favorite

Date: September 17th, 2022 10:10 PM
Author: Big sickened site

Signed up for alpha list. 190

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45188518)

Favorite

Date: September 20th, 2022 10:35 PM
Author: Swashbuckling dilemma

i thought this was interesting. i think it lends support to the idea that GPT-3 is capturing many aspects of language that humans don't.

https://www.theatlantic.com/technology/archive/2022/09/artificial-intelligence-machine-learing-natural-language-processing/661401/

But the sorcery of artificial intelligence is different. When you develop a drug, or a new material, you may not understand exactly how it works, but you can isolate what substances you are dealing with, and you can test their effects. Nobody knows the cause-and-effect structure of NLP. That’s not a fault of the technology or the engineers. It’s inherent to the abyss of deep learning.

I recently started fooling around with Sudowrite, a tool that uses the GPT-3 deep-learning language model to compose predictive text, but at a much more advanced scale than what you might find on your phone or laptop. Quickly, I figured out that I could copy-paste a passage by any writer into the program’s input window and the program would continue writing, sensibly and lyrically. I tried Kafka. I tried Shakespeare. I tried some Romantic poets. The machine could write like any of them. In many cases, I could not distinguish between a computer-generated text and an authorial one.

I was delighted at first, and then I was deflated. I was once a professor of Shakespeare; I had dedicated quite a chunk of my life to studying literary history. My knowledge of style and my ability to mimic it had been hard-earned. Now a computer could do all that, instantly and much better.

A few weeks later, I woke up in the middle of the night with a realization: I had never seen the program use anachronistic words. I left my wife in bed and went to check some of the texts I’d generated against a few cursory etymologies. My bleary-minded hunch was true: If you asked GPT-3 to continue, say, a Wordsworth poem, the computer’s vocabulary would never be one moment before or after appropriate usage for the poem’s era. This is a skill that no scholar alive has mastered. This computer program was, somehow, expert in hermeneutics: interpretation through grammatical construction and historical context, the struggle to elucidate the nexus of meaning in time.

The details of how this could be are utterly opaque. NLP programs operate based on what technologists call “parameters”: pieces of information that are derived from enormous data sets of written and spoken speech, and then processed by supercomputers that are worth more than most companies. GPT-3 uses 175 billion parameters. Its interpretive power is far beyond human understanding, far beyond what our little animal brains can comprehend. Machine learning has capacities that are real, but which transcend human understanding: the definition of magic.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45204612)

Favorite

Date: September 24th, 2022 8:41 PM
Author: Big sickened site

I have been self-radicalizing about the necessity of truly open-source pre-trained huge transformer models.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45225564)

Favorite

Date: September 24th, 2022 8:47 PM
Author: stirring brilliant background story

HuggingFace and Eleuther

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45225583)

Favorite

Date: September 24th, 2022 8:48 PM
Author: Big sickened site

I know, I am on it. But we need to get the parameters way up! It's imperative.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45225589)

Favorite

Date: September 24th, 2022 10:07 PM
Author: Swashbuckling dilemma

the huge ones won't be deployable on consumer hardware. way too many parameters. that seems like a problem if you want widespread availability.

although maybe the RETRO style models might make that more feasible. it takes a lot of the memorized information and pulls it out of the weights by providing a token database where the model can retrieve information.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45225909)

Favorite

Date: September 26th, 2022 4:16 PM
Author: Big sickened site

Anybody can rent out space on an insane rig tho, no? And yeah, many tight/sparse models seem to be in the offing.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45233619)

Favorite

Date: September 27th, 2022 10:41 PM
Author: Swashbuckling dilemma

that's true. not sure how expensive it is to find something that can run GPT-3.

i actually have the idea that high-quality text models can be made really small even without retrieval. one implication of Chinchilla potentially is that the human brain is really undertrained if you assume it's using a few trillion parameters for language. parameter scaling was the only way evolution could boost performance - increasing data wasn't an option even if it's more compute efficient, since humans already have really long training runs and evolution can only make development so long. using Chinchilla to make this argument is speculative though, since it's a scaling law for a particular ML model.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45241930)

Favorite

Date: October 6th, 2022 5:55 PM
Author: Swashbuckling dilemma

https://github.com/THUDM/GLM-130B

Seems like this can be run pretty cheaply locally. Better than gpt-3 too

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45291846)

Favorite

Date: October 6th, 2022 12:34 PM
Author: useless puppy

someone needs to hit it with a hammer

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45289792)

Favorite

Date: September 26th, 2022 11:55 AM
Author: Swashbuckling dilemma

seems relevant to predicting GPT-4 capabilities. from Sam Altman:

https://greylock.com/greymatter/sam-altman-ai-for-the-next-era/

SA:
I’ll start with the higher certainty things. I think language models are going to go just much, much further than people think, and we’re very excited to see what happens there. I think it’s what a lot of people say about running out of compute, running out of data. That’s all true. But I think there’s so much algorithmic progress to come that we’re going to have a very exciting time.

i doubt he would be saying something like that if GPT-4 was just a marginal scaling improvement over GPT-3

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45232245)

Favorite

Date: September 29th, 2022 7:49 PM
Author: Big sickened site

Holy shit.

https://phenaki.video/

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45253271)

Favorite

Date: September 29th, 2022 11:48 PM
Author: Swashbuckling dilemma

https://makeavideo.studio/?fbclid=IwAR1D2vQR1BI598JxE_2jKHis5y_TKtkY8J8ieAH2pf8_GnWPukQXcnj4eJk

of course not surprising at all after dall-e. it would be shocking if it didn't work.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45254551)

Favorite

Date: October 2nd, 2022 2:30 PM
Author: Swashbuckling dilemma

lots of diffusion results recently but this is the most important one:

https://www.wpeebles.com/Gpt.html

they use diffusion training to create a model that can calculate a distribution of weight updates for other NNs to achieve a desired loss (so like decision transformer but for training other NNs). the neat thing with this is that it seems far more efficient than traditional optimization and opens the door to full scale meta-learning. they'll be able to create networks to train other networks that are highly optimized for efficiency and systematic generalization. i think it's just a happy accident that stochastic gradient descent works as well as it does, but it's certainly not the optimal learning algorithm. meta-learning is likely what will close the remaining gaps.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45266682)

Favorite

Date: October 3rd, 2022 1:57 PM
Author: Big sickened site

Interesting, though it's difficult for me to evaluate if it makes a practical difference.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45271861)

Favorite

Date: October 6th, 2022 11:47 AM
Author: Swashbuckling dilemma

https://imagen.research.google/video/

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45289549)

Favorite

Date: October 6th, 2022 12:43 PM
Author: stirring brilliant background story

Most remarkable is that the architecture is nearly identical across all these models. This thing works essentially the same as GPT.

Very little creativity is required any more. All you need is compute, scale, and engineering.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45289825)

Favorite

Date: October 6th, 2022 12:43 PM
Author: Big sickened site

Turns out petumkin was right about SCALE.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45289830)

Favorite

Date: October 6th, 2022 12:44 PM
Author: stirring brilliant background story

Who is petumkin?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45289837)

Favorite

Date: October 6th, 2022 12:45 PM
Author: Big sickened site

It's irrelevant, a joke, nevermind. On topic yeah — pretty crazy to see the AI winter essentially ended by Moore's law.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45289846)

Favorite

Date: October 6th, 2022 12:51 PM
Author: stirring brilliant background story

Lots of cool stuff with current phase of development. But I still don't think we're any closer to C-3P0.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45289887)

Favorite

Date: October 6th, 2022 12:51 PM
Author: Big sickened site

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45289890)

Favorite

Date: October 7th, 2022 1:43 PM
Author: Swashbuckling dilemma

can't believe how fast this stuff is moving

https://ai.googleblog.com/2022/10/audiolm-language-modeling-approach-to.html

https://www.youtube.com/watch?v=_xkZwJ0H9IU

and:

https://github.com/ashawkey/stable-dreamfusion

which is like a week after

https://dreamfusion3d.github.io/

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45295661)

Favorite

Date: October 7th, 2022 1:53 PM
Author: Big sickened site

180.

This is going to be so amazing for fraud.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45295709)

Favorite

Date: October 7th, 2022 2:06 PM
Author: Swashbuckling dilemma

Still waiting for high quality music generation...

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45295769)

Favorite

Date: October 7th, 2022 2:14 PM
Author: Big sickened site

Ditto.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45295804)

Favorite

Date: October 12th, 2022 8:31 PM
Author: Big sickened site

new locklin on science on conformal prediction / aggregating experts / constructing a universal portfolio - https://scottlocklin.wordpress.com/2022/10/12/aggregating-experts-another-great-ml-idea-you-never-heard-of/

I have a hunch that (on the David Marr algorithmic level) the nucleus accumbens is running a version of this with reward-updating by phasic DA.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45322593)

Favorite

Date: October 13th, 2022 12:52 AM
Author: Swashbuckling dilemma

i thought this LW post was good and matches my own intuitions.

https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-think-strong-general-ai-is-coming-soon

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45323586)

Favorite

Date: October 13th, 2022 4:50 AM
Author: Big sickened site

I found it poorly-reasoned and unconvincing. I'm just not sold on any "risk" or possible benefit of "alignment".

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45323974)

Favorite

Date: October 13th, 2022 11:23 AM
Author: Swashbuckling dilemma

i was more thinking about the capabilities argument. transformers aren't special and are capable of learning a very broad range of tasks in many domains despite being highly constrained in the types of algorithms they can implement. gradient descent is able to fill in architectural details that don't need to be specified by the human engineer. the current state of AI is about as good a warning sign that intelligence is likely algorithmically simple as you could expect.

making the case for alignment is more complex. at the very least, failure modes like this seem highly plausible:

https://onlinelibrary.wiley.com/doi/10.1002/aaai.12064

more fundamentally, it doesn't seem obvious how you would go about encoding a good objective function that an AI freely acting in the real world should optimize for. what are you imagining that should look like?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45325100)

Favorite

Date: October 13th, 2022 11:31 AM
Author: Big sickened site

I just don't think alignment is an issue. I was convinced by Hanson re: foom by the old debate (https://www.overcomingbias.com/2013/02/foom-debate-again.html) and that's not changed, and AI seems to still be more or less scaling with hardware & training examples. The notion of a general self-modifying agents recursively reaching omnipotence or whatever is still a fantasy, afaict.

As for alignment: I'm not worried since with AI we can just turn it off. If we're in a world where we can't just turn it off, no amount of alignment engineering will ever matter, since incentives are such that somebody will eventually unleash the beast, so likely the best we can do is ensure a plentitude of warring superintelligent AI's (which is not so diff. from what we have now with culture).

Also, the author of the lesswrong piece seems to me to overstate the case and misstate what his citations show in several places. But it'd be a chore to go point by point & I have little time for a bit.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45325181)

Favorite

Date: October 13th, 2022 1:28 PM
Author: Swashbuckling dilemma

i'm not sure i buy either position in the foom debate. i don't think recursive self-improvement is likely to be very important, but it still seems likely that a general cognitive architecture could quickly surpass human intelligence in many domains.

if you look at specific examples where ML has surpassed human ability, the systems are frequently able to be trained with modest resources to superhuman skill levels in a short period of time. take Go as an example. AlphaZero was trained in a day to a superhuman level starting from no prior knowledge. The paper was released in 2017 and computational resources have grown considerably since then. Algorithmic improvements now allow you to train a Go playing agent to superhuman ability in a matter of months on a single GPU. I suspect this could be lowered considerably with transfer learning and other types of algorithmic experimentation. i'm not sure what the lower bound here is, but it seems reasonable to suspect it could be a few days on a single GPU. considering Go strategy has been refined by humans for many years, this doesn't seem very encouraging.

i don't think the current costs of language models is strong evidence against this. we are talking about the training requirements of feedforward only autoregressive text generators trained on one modality of data. there is probably a ton of room for training optimization here.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45325813)

Favorite

Date: October 13th, 2022 1:38 PM
Author: Big sickened site

To be clear, my critique is largely about social effects and what we can do, not the tech as such. We've experienced tech that affords superhuman performance before—that just is what technology is, in technology terms. People will just use the tech to enhance the stuff they already do, domain by domain, in a staged process—same as the usual diffusion on innovation.

Nor do I think we've glimpsed something like a general purpose agent. In fact I don't think humans are such — rather we are the composite result of several different programs stitched together by evolution (including cultural and enactivist). Gato is in principle no different from a fully text-based token predictor, and it is easy to show the limitations of such.

In short, I think this is an incredibly exciting field, and we'll see diffusion into lots of separate fields and maybe a reconfiguration of how we think about agency. But we're already cybernetic, long have been even in terms of "controlling" the body, & this will just be a further extension & clarification.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45325860)

Favorite

Date: October 22nd, 2022 3:14 PM
Author: Swashbuckling dilemma

i'm not concerned about GATO being general. it's more of a proof of concept of a stupidly easy solution to catastrophic forgetting than anything else. and i agree that humans aren't general. it seems like a lot of things about reality can be learned by the types of functions learned in the brain (and increasingly by deep nets), but this isn't truly everything. i'm not sure that matters much.

my concern is that 1) deep reinforcement learning works well in game environments where data is not constrained. So you can do things like train an agent to play DOTA well using dumb and very simple RL algorithms if you are willing to just spend a lot of compute on simulation 2) unsupervised learning looks like it won't be very hard, so it seems feasible to learn high quality world models in the near future. if you can use world models for simulations with dumb (but increasingly better) RL algorithms, i think many things suddenly become possible in single architectures and it's harder to see what the limits are.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45373678)

Favorite

Date: October 22nd, 2022 3:19 PM
Author: Big sickened site

I suppose time will tell. Our main difference, if there is one, may be of attitude—I think this stuff is awesome and becoming moreso, and trust society to stand the shock. We've been cybernetic this whole time, etc.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45373710)

Favorite

Date: October 22nd, 2022 3:25 PM
Author: Swashbuckling dilemma

even though i'm concerned, i actually don't have a strong intuition that things will go wrong, unlike LW people. it seems like a really neat time to be alive to see this.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45373734)

Favorite

Date: October 22nd, 2022 3:45 PM
Author: Big sickened site

It is 180 af.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45373843)

Favorite

Date: October 21st, 2022 4:57 PM
Author: Big sickened site

A test of AI conceptual abilities could proceed as follows — first, give a prompt introducing some concept, like gravitation. Then, ask a variety of questions that depend on the concept introduced, and see if the AI gives consistent responses, indicating an understanding of the concept.

You could extend the procedure by stacking a hierarchy of concepts.

I don't think any current AI would do well on such a test. Nor most humans, but who cares about them

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45369556)

Favorite

Date: October 22nd, 2022 2:58 PM
Author: Swashbuckling dilemma

yeah, current language models would be terrible at that. they can give sensible responses a large percentage of the time and then randomly go off the rails and say contradictory things. not sure what the cause of that is. the limitations of autoregressive generation where it has to continue on with its bullshit after predicting a bad token?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45373593)

Favorite

Date: October 22nd, 2022 3:00 PM
Author: Big sickened site

I defer to Valiant but see https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence#Neuro-symbolic_AI:_integrating_neural_and_symbolic_approaches

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45373611)

Favorite

Date: November 15th, 2022 4:41 PM
Author: Swashbuckling dilemma

I thought this was pretty neat:

https://mind-vis.github.io/

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45497856)

Favorite

Date: November 15th, 2022 4:43 PM
Author: Big sickened site

Neat!

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45497869)

Favorite

Date: November 15th, 2022 4:46 PM
Author: white sex offender

"Standford University, Vision and Learning Lab"

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45497882)

Favorite

Date: November 15th, 2022 5:13 PM
Author: Swashbuckling dilemma

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45498004)

Favorite

Date: November 19th, 2022 8:08 PM
Author: Swashbuckling dilemma

https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/

this model kind of sucked, but the people who complain about this stuff are a problem and the major reason why everyone is reluctant to provide API access to any of their large models.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45518943)

Favorite

Date: November 19th, 2022 8:12 PM
Author: Big sickened site

We need to stop coddling these worthless crybabies. They should have their tongues wrenched out with red-hot pincers & their fingers broken in a heavy bronze door's frame. They are holding back the progress of humanity, and everybody's futures. FUCK THEM!

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45518956)

Favorite

Date: November 19th, 2022 8:18 PM
Author: Swashbuckling dilemma

cr. ljl at the idea some retarded language model is "dangerous"

also, the paper showed multiple passes (epochs) of the training data seems to increase performance even with very large data sets, so the idea we are running out of data in the near future is likely wrong. pretty important.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45518977)

Favorite

Date: November 19th, 2022 8:20 PM
Author: Big sickened site

Very 180.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45518984)

Favorite

Date: November 29th, 2022 9:38 AM
Author: Big sickened site

Gpt4 in Dec or Jan?

Also, see https://towardsdatascience.com/run-bloom-the-largest-open-access-ai-model-on-your-desktop-computer-f48e1e2a9a32

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45558066)

Favorite

Date: November 29th, 2022 9:50 PM
Author: Swashbuckling dilemma

given up on predicting this. definitely thought it would come out by the summer.

i haven't tried bloom yet but i heard mixed things about it. Eleuther just got a shit ton of compute to train multimodal foundational models so the public models will probably get a lot better in the next year or two.

at least we have the new GPT-3 for now.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45561327)

Favorite

Date: November 29th, 2022 9:58 PM
Author: Big sickened site

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45561365)

Favorite

Date: November 30th, 2022 1:58 PM
Author: Swashbuckling dilemma

The poem generation is pretty 180 now

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45564617)

Favorite

Date: November 30th, 2022 2:10 PM
Author: Big sickened site

The Earth decays in the aftermath of war,
And ever since Alexander’s once-epic rule,
Rome was sacked, Byzantium lay in the yore,
Civilizations cast apart their strife so cruel.

Our mother Earth swallowed human blood and woe,
Mighty cities drowned in winters‘ cold embrace,
The ravaged lands torn like a phantom show --
Its fate an endless could-have-been of disgrace.

Regal empires lost within its hollowed hulls
Treacherous tides overthrow what peace was found;
Yet all men know that nothing lasts e'er whole;
Ignore hope's light and o'ercome by deathbound gloom round.

Tenebris erit in fortuna saecula –
Let none avert this path of endless doom,
Is veritas caeca aevo summa dura?--
Deposita est praeteritorum tempus tom.

Enjoy glory now for it too will vanish fast –
Grieve then for life should have no permanence last.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45564688)

Favorite

Date: November 30th, 2022 5:00 PM
Author: Swashbuckling dilemma

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45565449)

Favorite

Date: December 16th, 2022 10:13 PM
Author: red parlor really tough guy

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45645272)

Favorite

Date: December 8th, 2022 1:50 PM
Author: Big sickened site

Sorry if this was poasted before but man is this stuff interesting: https://arxiv.org/abs/2203.03466

As my colleague upthread was saying, it seems like gpt4 may be superannuated before it comes out. Who needs it?

The fact that results are so good w/ Zero-Shot Hyperparameter Transfer directly imo reads on to a long puzzle in the neurobiology of learning (and also of philosophy -- viz. problem of 'induction' {really ampliative inference} -- in re how learning is transferred at all. [It also, imo, suggests what the algorithmic substrate of mentalization may be.]

Friston's FEP ties it all together (minimizing surprisal). Cf. https://arxiv.org/abs/2202.11532

One is tempted to try & even tie all of this to stuff like string theory compactification vs. swamplands but well over my paygrade. Really wish I were at a topflight research uni sometimes.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#45604352)

Favorite

Date: March 15th, 2023 11:26 PM
Author: Swashbuckling dilemma

any updated thoughts in the wake of GPT-4? i doubt this will change anyone's position on scaling. it's especially hard to discuss considering they didn't reveal anything about the architecture.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46057863)

Favorite

Date: March 16th, 2023 7:40 AM
Author: Big sickened site

It seems like an incremental improvement over GPT4 and, for my $, busts the "Scaling uber alles" case since it still fails fairly easy out-of-sample logic problems and the like. That said, this is what I expected since the issues Marcus & others identify are architectural, and I don't really care since I'm more interested in the amazing use cases in automating scut work so that brains can be more productive & creative.

New thread here, which I will x-link: https://autoadmit.com/thread.php?thread_id=5306485&mc=115&forum_id=2

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46058577)

Favorite

Date: March 16th, 2023 11:39 AM
Author: Swashbuckling dilemma

yeah, that's my impression of it as well. it would be worth knowing the extent and types of scaling implemented here, but whatever it was the end result just looks like a better GPT-3.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46059310)

Favorite

Date: March 16th, 2023 12:49 PM
Author: Big sickened site

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46059566)

Favorite

Date: March 16th, 2023 3:25 PM
Author: Swashbuckling dilemma

I sometimes wonder about what exactly the pro scaling position is. It seems like it morphed into the argument you could get everything from training a transformer on lots of data and making it sufficiently large. But taken literally it just seems to mean that there is some simple, general purpose architecture that when trained at scale produces high capability. I think the second argument is still plausible (and the one consistent with existing neuroscience where the cortex is pretty uniform) even though i increasingly doubt transformers trained by gradient descent is sufficient.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46060453)

Favorite

Date: March 28th, 2023 5:55 PM
Author: Swashbuckling dilemma

https://futureoflife.org/open-letter/pause-giant-ai-experiments/

Seems like a bit of an overreaction to gpt-4. What's the deal with Gary Marcus ending up in the doomer camp?

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46109938)

Favorite

Date: March 28th, 2023 5:57 PM
Author: Big sickened site

He's a faggy lib.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46109947)

Favorite

Date: April 8th, 2023 8:36 PM
Author: Swashbuckling dilemma

https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-to-take-on-openai/

billion dollar training runs in the near future is pretty surprising to me. if it's true, i'll assume microsoft, google and others will be doing massive training runs in the near future as well. seems like we are going to find out the limits of pure scaling before long.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46162171)

Favorite

Date: April 8th, 2023 8:56 PM
Author: Big sickened site

Soooooo cr. Can't wait.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#46162224)

Favorite

Date: December 9th, 2023 1:03 PM
Author: Swashbuckling dilemma

Gemini having basically GPT 4 level performance seems like strong evidence against the transformer scaling version of the hypothesis. They had a very strong incentive to beat gpt 4 and apparently poured a ton of compute into this without much more to show for it. Even the chain of thought tricks didn’t do much.

(http://www.autoadmit.com/thread.php?thread_id=5135617&forum_id=2#47151175)