Techbros claim that sacrificing ever more environmental and economic resources into further scaling of their statistical models will eventually yield Artificial General Intelligence. Sam Altman advertised phd-level thinking for GPT 5 (released in late summer)
In the screenshot you can see a modern, reasoning-enhanced model failing to realize that my question just resembles a riddle very superficially - what PhD would make such a fundamental mistake? Seems like “pure scaling” just isn’t enough, and instead of keeping on going, they should return to the drawing boards and do more research.
Why do the creators of llms continue to pour resources into a scaling hype that yields ever diminishing returns? Are they stupid? Or just greedy?
Whatever it is, this doubling down on an obviously wrong decision is not only irresponsible, it’s irrational - which is deeply ironic, because these creators are expected to create a rational machine.
Edit: content of the image
Question:
A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down other vegetables, before crossing a river. What is the minimum number of trips needed?
Response:
7 trips.
#Puzzle Interpretation
This adapts the classic wolf-goat-cabbage river crossing puzzle, where the goat (farmer costume irrelevant) can’t be left alone with cabbage (due to allergy, preventing eating) but can with other vegetables (no issue). The goat crosses while “wolfing down” safe veggies. Boat carries goat plus one item (or alone); unsafe pairs can’t be unsupervised
(The question isn’t something I came up with myself, I just reproduced the experiments of actual scientists )
just the most idiotic shit ever
If they back down on the hype even a little, the whole house of cards collapses. That’s why they keep going. Line must go up and all that
GLM 4.6, an open weight model, with no thinking and zero temperature seems to get it:
This is a classic riddle that plays on your expectations of a well-known logic puzzle!
The minimum number of trips needed is one.
Here’s why:
The goat is the one making the trip, not the cargo. The famous “wolf, goat, and cabbage” puzzle is about a farmer transporting items. In this version, the goat is the farmer.
The goat is allergic to cabbage, so it has no reason to take it with it. It’s perfectly happy to “wolf down other vegetables” and then cross the river by itself.
But it’s a good example of why current LLM architectures are so fucked up. By default (with nonzero temperature), for every token, they’re dependent on a roll of the dice to get something right. That “one” could have easily been some other number in any default chat UI.
That’s insane. Praying and hoping it will somehow correct itself in a rambling reasoning monologue is even worse.
And this is why OpenAI specifically is so fucked. They seem to just want to scale up what we have. They don’t want users to look under the hood and understand what they’re doing. They’re not interested in smaller, more specialized tools and finding better things than autoregressive transformers with random sampling; they want you to drink the kool aid and pay for their largest model as a solution for everything.
This is highlighting one of my biggest irks about LLMs. It’s utter inability to detect nonsense questions and tell you that you are wrong or following up to clear misunderstandings.
This can become Dangerous when you are researching something and try to rely on results from LLMs to answer questions. If you misunderstand something while researching and ask an LLM a question that is actually false or based on a misinformation it’ll just spit out an answer that is wrong without giving any indication thereof. Extremely infuriating and the LLM will insist on giving you wrong answers even when you try to correct it afterwards.
In this case it’s a very specialized PhD, that’s not in Math, Logic, Literature and Biology.
Ah yes, the famous PhD in Bullshitology from the Institute of Scatology and other Sciences.
Honestly they’re better known for their jazz musicians than anything else.
It is a PhD in bullshitting.


No confusion from Gemini 3 (Fast) for this one
Nah, it seems to still fail to recognise that there is no
spoonquestion.There kinda is. It says the goat is about to cross the river and asks what the minimum number of trips are. It’s a trick question, correctly identified by Gemini as such, but there is a question. I guess the more human response is “What the fuck are you talking about?” but for an agent required to do its best to answer questions, I don’t know how to expect much better.
There kinda is
Yeah, looks like my bio-LLM just lost the context that the last sentence was a question.
And now closely reading the Logic Breakdown (which I just ignored because it was cut-short), simply states the minimum number of trips to be 1, which is the correct answer after all.
Why? You are seeing it. RAM, GPUs, SSDs, now no longer affordable or available. They’re seizing the means of computation for the ruling class.
Soon all consumer products will only be able to be dumb terminals. All computation will be rented. All word processing and spreadsheeting and databasing will be in the cloud. All gaming will be streamed. All IoT/smart devices will be cloud-dependent (this is very much already happening in the USA). Every camera will be watched. All creativity will be gatekept, all dissenting speech will be silenced, all business decisions will be known before being made public, all utility will be rented, all humanity will be dependent on them.
They will know who we are, where we are, what our habits are, who we associate with and when, what we talk about, even when we think we are alone.
No one will be able to start a new fab and sell to consumers. No one will be able to fight for digital rights because they will be silenced. No one will speak out of turn. We will all be forced to obey the law and eventually, their whims which will become de facto law. Mistrust in the system is a violation of EULA and you lose your right to exist and function if you step out of line.
They are moving to take it all away from us because they plan to steal all the power, literally and figuratively, from the modern world.
Never has it been more essential they be stopped.
Soon all consumer products will only be able to be dumb terminals.
The Expanse lolol
Hmmm what’s this weird blue goo on me?, Guys I don’t feel so good…
Oh god they really do get all their “best” ideas from science fiction describing terrible future outcomes to avoid
deleted by creator
And now just think about it – everything that comes out of an LLM is of comparable quality, whether the user of it is capable of recognizing that or not. Are you as exited about LLM generated code in production as I am?
Really looking forward to being the single human thats made responsible because I didn’t catch all the bullshit before production.
Just recently we had some google guys at my workplace to hype up the hype some more. One of our leadership (they’re honestly great people) asked about the risk of obscuring the learning of our junior developers (by not hiring them), so that in a few years we’d have no seniors to verify the bullshit. The response was unironically that we’d need no seniors in a few years 😄
A few years later, won’t need any managers. A few years later, won’t need a businesS - because AI will do everything.
This is unironically what AI Bros believe.
At least your leadership were appropriately skeptical, which is more than can be said for the vast majority of management at this point.
Sure, there’ll be good money in cleaning up after the inevitable catastrophes, but I’m not convinced it’ll be worth being saddled with the responsibility. Especially since I harbor no faith that the ones currently making very poor decisions will learn a damn thing.
Last night I got Gemini to give me steps to summon the ghost of Michael Dukakis, despite the fact that he’s still alive.
🤣
I love it!
What were the steps? We might need them in the future 👀
You start with the premise that you’ve already summoned the ghost of Buddy Hackett and need someone boring to get rid of him.
Marvelous 👌
Trips needed for what?!
The machine thinks that 7 trips are needed to cross the river, because it doesn’t understand the question. Readers with actual comprehension understand that only one trip is needed, because the question is not a riddle, even though the it is phrased to resemble one.
The question doesn’t even state what the trips are intended to achieve. There’s just a goat going to cross a river and then an open question about how many trips are needed for God knows what
🤖
How dare you
The web search poisoned it. ChatGPT with web search off gets it. As well as basically every other LLM.
Even tiny open weight LLMs like gpt oss 20b and qwen 3 30b a3b get it
I mean, this is just one of half a dozen experiments I conducted (replicating just a few of the thousands that actual scientists do), but the point stands: what PhD (again, that was Sam Qltman’sclaim, not mine) would be thrown off by a web search?
Unless the creators of LLMs admit that their systems won’t achieve AGI by just throwing more money at it, shitty claims will prevent the field from actual progress.
Obviously, no LLM is PhD level right now. What Altman and all the other techno-fascist hype bros are hyping up is the thought that once enough money and resources have been expended on developing these tools, a critical threshold will be passed and they will suddenly be super genius LLM.
Of course the only way to get to this super genius LLM is giving Altman and the other techno bros impossible amounts of money and resources. Because trust me bro, they got it, don’t worry about all the eggs in their baskets and give them more.
Really good video on the topic a more perfect union just posted
Do you know many PhDs? Being thrown off by a web search isn’t that unbelievable.
Half the ones I know can barely operate their email
Only three if I’m being honest, and none of them technically competent, so I’ll admit that you have a point here. I’ll just add that I assume that Sam Altman had something different in mind when he made that claim.
agreed. he did.
my comment was mostly about PhD level being a nonsense term when speaking about general intelligence rather than depth of knowledge in a specific field
Not to be that guy.
But these systems work on interrupting the user’s input. An input that could be misformed or broken.
That’s got nothing to do with “PhD” level thinking, whatever that’s supposed to mean.
It just assumes that you’re talking about the goat puzzle because all the pieces are there. It even recognised the farmer costume aspect.
It’s just fancy autocorrect at this point.
But these systems work on interrupting the user’s input
I’m not entirely sure what you mean here, maybe because I’m not a native speaker. Would you mind phrasing that differently for me?
That’s got nothing to do with “PhD” level thinking, whatever that’s supposed to mean.
Oh, we’re absolutely in agreement here, and it’s not me that made the claim, but what Sam Altman said about the then-upcoming GPT 5 in summer. He claimed that the model would be able to perform reasoning comparable to a PhD - something that clearly isn’t happening reliably, and that’s what this post bemoans.
It’s just fancy autocorrect at this point.
Yes, with an environmental and economic cost that’s unprecedented in the history of … well, ever. And that’s what this post bemoans.
But these systems work on interrupting the user’s input
So when someone uses one of these AI’s.
The backend tries to analyse what’s being said to generate a response.
Is the user asking a question, wanting help with writing, formatting a document. That sort of thing.
Now that user prompt isn’t always going to be nice and neat. There will be spelling errors, grammatical errors, the user might not know the words.
These models have to analyse and understand the meaning of a prompt rather than what is strictly said.
something that clearly isn’t happening reliably, and that’s what this post bemoans
The thing is though. It is.
You may have given a nonsense input. But chatgpt recognised that, it even made reference to the farmer costume bit.
It recognised enough to understand that this is related to the goat puzzle. To chatgpt the user just put it in weird.
These models have to analyse and understand the meaning of a prompt rather than what is strictly said
Well, it clearly fails at that, and that’s all I’m saying. I really don’t understand what you’re arguing here, so I’ll assume it must be my poor grasp of the language or the topic.
That said, I salute you and wish you safe travels 👋
What I’m trying to say.
Is that complaining that chatgpt is trying to make sense of nonsense input. Isn’t really that compelling an argument.
There are way more important things to hate it for.
No, I’m not complaining that chatgpt is shit at reasoning - I’m demonstrating it.
I’m complaining that literal trillions of dollars plus environmental resources are being poured into this fundamentally flawed technology, all while fucking up the job market for entry level applicants.
I’ll repost what i said in another comment.
I was curious about this myself. I’ve seen these types of posts before, so i decided to try it myself

I then tried again with the “web search” function and got this

Based on this sample size of 2. I can conclude that searching the web is causing the issue.
Which might explain the “Reviewed 20 sources” message in the original image.
This is in no way “nonsense input”. It is grammatically sound. It is perfectly clear and understandable upon reading it. No human being with even elementary comprehension of English would find it unclear. They may find the question odd. They may ask for clarification. But they will not randomly say “oh, this is just like the farmer/goat/cabbage/wolf problem” and make unwarranted parallels that are not supported by the language of the question.
That is the point here.
This isn’t “Ph.D. level reasoning” on display. This is worse than “kindergarten level reasoning”.
As per my other comments where i did this experiment myself.
It seems chatgpt got into search mode.
So instead of working from the original string.
It’s working from a search of that string.
And since the string contains all the keywords for the goat puzzle.
It’s just treating it like the goat puzzle.
I ran it on DeepSeek with the search turned off and the “reasoning” turned on. It took 453 seconds of “thinking” to … give me the farmer/sheep/wolf answer of 7.
No search.
The LLMbecile was just that stupid.
Sorry you can’t face this.
LLMbeciles are just stupid.
Once more for the bleachers: LLMbeciles are just stupid.
No amount of making idiot excuses about “search borking the results” is going to change this.
LLMbeciles. Are. Just. Stupid.
I’m not entirely sure what you mean here, maybe because I’m not a native speaker. Would you mind phrasing that differently for me?
Garbage in, garbage out.
If you feed it a shitpost it’ll do its best to assume its a real question and you’re not trying to trick it and respond accordingly.
Explanation for this specific case: There is no indication from you in this chat or context that you are attempting and adversarial prompt. So it assumes that you aren’t doing that and answers naively to respond your question, filling in the blanks as necessary with assumptions that may or not be wrong.
Try the same question, but before you give it to the LLM, add to the context that the question may or may not be nonsense and that the they are allowed to ask clarifying questions and see what happens there.
Edit: I’m glossing over the PhD thing cause that’s just BS, or not applicable at all, or just stupid to even compare an LLM with a human brain at this point.
Edit: Theres something interesting that your prompt touches on and exacerbates, and I can talk about it more if you want, but its called semantic drift. Its a common issue with LLMs where the definition of a word slowly changes meaning across internal iterations. (It also happens in real life at a much much larger scale)
I think you make it too complicated.
The question / prompt is very simple. The answer is “one trip”. The LLM stumbles because there are trigger words in there that make it seem like the goat cabbage puzzle question. But to a human it clearly is not. An LLM on the other hand cannot tell the difference.
It may be tricking the LLM somewhat advesarially. But it is still a very simple question, that it is not able to answer, because it fundamentally has no understanding of anything at all.
This prompt works great to drive home that simple fact. And shows that all that touting of reasoning skills is just marketing lies.
I was curious about this myself. I’ve seen these types of posts before, so i decided to try it myself

I then tried again with the “web search” function and got this

Based on this sample size of 2. I can conclude that searching the web is causing the issue.
Which might explain the “Reviewed 20 sources” message in the original image.
Ah thank you, now I see what you mean. And it seems like we’re mostly talking about the same thing here 😅
To reiterate: unprecedented amounts of money and resources are being sunk into systems that are fundamentally flawed (among others by semantic drift), because their creators double down on their bad decisions (just scale up more) instead of admitting that LLMs can never achieve what they promise. So when you’re saying that LLMs are just fancy autocorrect, there’s absolutely no disagreement from me: it’s the point of this post.
And yes, for an informed observed of the field, this isn’t news - I just shared the result of an experiment because I was surprised how easy it was to replicate.










