@scruiser

scruiser@awful.systems · 22 hours ago

Keep in mind I was wildly guessing with a lot of numbers… like I’m sure 90 GB vRAM is enough for decent quality pictures generated in minutes, but I think you need a lot more compute to generate video at a reasonable speed? I wouldn’t be surprised if my estimate is off by a few orders of magnitude. $.30 is probably enough that people can’t spam lazily generated images, and a true cost of $3.00 would keep it in the range of people that genuinely want/need the slop… but yeah I don’t think it is all going cleanly away once the bubble pops or fizzles.

scruiser@awful.systems · 22 hours ago

After GPT-3 failed to be it, they aimed at five iterations instead because that sounded like a nice number to give to investors, and GPT-3.5 and GPT-4o are very much responses to an inability to actually manifest that AGI on a VC-friendly timetable.

That’s actually more batshit than I thought! Like I thought Sam Altman knew the AGI thing was kind of bullshit and the hesitancy to stick a GPT-5 label on anything was because he was saving it for the next 10x scaling step up (obviously he didn’t even get that far because GPT-5 is just a bunch of models shoved together with a router).

scruiser@awful.systems · edit-2 22 hours ago

Even if was noticeably better, Scam Altman hyped up GPT-5 endlessly, promising a PhD in your pocket, and an AGI and warning that he was scared of what he created. Progress has kind of plateaued, so it isn’t even really noticeably better, it scores a bit higher on some benchmarks, and they’ve patched some of the more meme’d tests (like counting rs in strawberry… except it still can’t count the r’s in blueberry, so they’ve probably patched the more obvious flubs with loads of synthetic training data as opposed to inventing some novel technique that actually improves it all around). The other reason the promptfondlers hate it is because, for the addicts using it as a friend/therapist, it got a much drier more professional tone, and for the people trying to use it in actual serious uses, losing all the old models overnight was really disruptive.
There are a couple of speculations as to why… one is that GPT-5 variants are actually smaller than the previous generation variants and they are really desperate to cut costs so they can start making a profit. Another is that they noticed that there naming scheme was horrible (4o vs o4) and confusing and have overcompensated by trying to cut things down to as few models as possible.
They’ve tried to simplify things by using a routing model that makes the decision for the user as to what model actually handles each user interaction… except they’ve screwed that up apparently (Ed Zitron thinks they’ve screwed it up badly enough that GPT-5 is actually less efficient despite their goal of cost saving). Also, even if this technique worked, it would make ChatGPT even more inconsistent, where some minor word choice could make the difference between getting the thinking model or not and that in turn would drastically change the response.
I’ve got no rational explanation lol. And now they overcompensated by shoving a bunch of different models under the label GPT-5.

scruiser@awful.systems · edit-2 2 days ago

There are techniques for caching some of the steps involved with LLMs. Like I think you can cache the tokenization and maybe some of the work of the attention head is doing if you have a static, known, prompt? But I don’t see why you couldn’t just do that caching separately for each model your model router might direct things to? And if you have multiple prompts you just do a separate caching for each one? This creates a lot of memory usage overhead, but not more excessively more computation… well you do need to do the computation to generate each cache. I don’t find it that implausible that OpenAI couldn’t manage to screw all this up somehow, but I’m not quite sure the exact explanation of the problem Zitron has given fits together.

(The order of the prompts vs. user interactions does matter, especially for caching… but I think you could just cut and paste the user interactions to separate it from the old prompt and stick a new prompt on it in whatever order works best? You would get wildly varying quality in output generated as it switches between models and prompts, but this wouldn’t add in more computation…)

Zitron mentioned a scoop, so I hope/assume someone did some prompt hacking to get GPT-5 to spit out some of it’s behind the scenes prompts and he has solid proof about what he is saying. I wouldn’t put anything past OpenAI for certain.

scruiser@awful.systems · 2 days ago

If they got a lot of usage out of a model this constant cost would contribute little to the cost of each model in the long run… but considering they currently replace/retrain models every 6 months to 1 year, yeah this cost should be factored in as well.

Also, training compute grows quadratically with model size, because its is a multiple of training data (which grows linearly with model size) and the model size.

scruiser@awful.systems · 2 days ago

Even bigger picture… some standardized way of regularly handling possible combinations of letters and numbers that you could use across multiple languages. Like it handles them as expressions?

scruiser@awful.systems · 3 days ago

I know like half the facts I would need to estimate it… if you know the GPU vRAM required for the video generation, and how long it takes, then assuming no latency, you could get a ballpark number looking at nVida GPU specs on power usage. For instance, if a short clip of video generation needs 90 GB VRAM, then maybe they are using an RTX 6000 Pro… https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/ , take the amount of time it takes in off hours which shouldn’t have a queue time… and you can guessestimate a number of Watt hours? Like if it takes 20 minutes to generate, then at 300-600 watts of power usage that would be 100-200 watt hours. I can find an estimate of $.33 per kWh (https://www.energysage.com/local-data/electricity-cost/ca/san-francisco-county/san-francisco/ ), so it would only be costing $.03 to $.06.

IDK how much GPU-time you actually need though, I’m just wildly guessing. Like if they use many server grade GPUs in parallel, that would multiply the cost up even if it only takes them minutes per video generation.

scruiser@awful.systems · 3 days ago

And looks like dgerad is already on the case and the lesswrongers are aware of it.
o7

scruiser@awful.systems · 3 days ago

promptfarmers, for the “researchers” trying to grow bigger and bigger models.

/r/singularity redditors that have gotten fed up with Sam Altman’s bs often use Scam Altman.

I’ve seen some name calling using drug analogies: model pushers, prompt pushers, just one more training run bro (for the researchers); just one more prompt (for the users), etc.

scruiser@awful.systems · 3 days ago

I could imagine a lesswronger being delusional/optimistic enough to assume their lesswrong jargon concepts have more academic citations than a handful of arXiv preprints… but in this case they just admitted otherwise their only sources are lesswrong and arXiv. Also, if they know wikipedia’s policies, they should no the No Original Research rule would block their idea even overlooking single source and conflict of interest.

scruiser@awful.systems · 4 days ago

Yeah that article was one of the things I had mind. It’s the peak of centrist liberalism where EAs and lesswrongers can think these people are literally going to cause mankind’s extinction (or worse) and they can’t even bring themselves to be rude to them. OTOH, if they actually acted coherently on their nominal doomer beliefs, they would be carrying out terrorism on a far greater scale than the Zizians, so maybe it is for the best they are ideologically incapable of direct action.

scruiser@awful.systems · 4 days ago

Yall ready for another round of LessWrong edit wars on Wikipedia? This time with a wider list of topics!

https://www.lesswrong.com/posts/g6rpo6hshodRaaZF3/mech-interp-wiki-page-and-why-you-should-edit-wikipedia-1

On the very slightly merciful upside… the lesswronger recommends “If you want to work on a new page, discuss with the community first by going to the talk page of a related topic or meta-page.” and “In general, you shouldn’t post before you understand Wikipedia rules, norms, and guidelines.” so they are ahead of the previous calls made on Lesswrong for Wikipedia edit-wars.

On the downside, they’ve got a laundry list of lesswrong jargon they want Wikipedia articles for. Even one of the lesswrongers responding to them points out these terms are a bit on the under-defined side:

Speaking as a self-identified agent foundations researcher, I don’t think agent foundations can be said to exist yet. It’s more of an aspiration than a field. If someone wrote a wikipedia page for it, it would just be that person’s opinion on what agent foundations should look like.

scruiser@awful.systems · 5 days ago

They’re cosplaying as activists, have no ideas about how to move the public image needle other than weird movie ideas and hope, and are literally marinated in SV technolibertarianism which sees government regulation as Evil.

It is kind of sad. They are missing the ideological pieces that would let them carry out activism effectually so instead they’ve gotten used as a free source of crit-hype in the LLM bubble. …except not that sad because they would ignore real AI dangers in favor of their sci-fi scenarios, so I don’t feel too bad for them.

scruiser@awful.systems · 5 days ago

And why would a rich guy be against a “we are trying to convince rich guys to spend their money differently” organization.

Well when they are just passively trying to convince the rich guys, they can use the organization to launder reputation or boost ideologies they are in favor of. When the organization actually tries to get regulations passed, even ineffectually, well, that is a threat to the likes of Thiel.

scruiser@awful.systems · edit-2 5 days ago

The quirky eschatologist that you’re looking for is René Girard, who he personally met at some point. For more details, check out the Behind the Bastards on him.

Thanks for the references. The quirky theology was so outside the range of even the weirder Fundamentalist Christian stuff I didn’t recognize it as such. (And didn’t trust the EA summary because they try so hard to charitably make sense of Thiel).

In this context, Thiel fears the spectre of AGI because it can’t be influenced by his normal approach to power, which is to hide anything that can be hidden and outspend everybody else talking in the open.

Except the EAs are, on net, opposed to the creation of AGI (albeit they are ineffectual in their opposition). So going after the EAs doesn’t make sense if Thiel is genuinely opposed to inventing AGI faster. So I still think Thiel is just going after the EA’s because he’s libertarian and EA has shifted in the direction of trying to get more government regulation. (As opposed to a coherent theological goal beyond libertarianism). I’ll check out the BtB podcast and see if it changes my mind as to his exact flavor of insanity.

scruiser@awful.systems · edit-2 6 days ago

So… apparently Peter Thiel has taken to co-opting fundamentalist Christian terminology to go after Effective Altruism? At least it seems that way from this EA post (warning, I took psychic damage just skimming the lunacy). As far as I can tell, he’s merely co-opting the terminology, Thiel’s blather doesn’t have any connection to any variant of Christian eschatology (whether mainstream or fundamentalist or even obscure wacky fundamentalist), but of course, the majority of the EAs don’t recognize that, or the fact that he is probably targeting them for their (kind of weak to be honest) attempts at getting AI regulated at all, and instead they charitably try to steelman him and figure out if he was a legitimate point. …I wish they could put a tenth of this effort into understanding leftist thought.

Some of the comments are… okay actually, at least by EA standards, but there are still plenty of people willing to defend Thiel

One comment notes some confusion:

I’m still confused about the overall shape of what Thiel believes.

He’s concerned about the antichrist opposing Jesus during Armageddon. But afaik standard theology says that Jesus will win for certain. And revelation says the world will be in disarray and moral decay when the Second Coming happens.

If chaos is inevitable and necessary for Jesus’ return, why is expanding the pre-apocalyptic era with growth/prosperity so important to him?

Yeah, its because he is simply borrowing Christian Fundamentalists Eschatological terminology… possibly to try to turn the Christofascists against EA?

Someone actually gets it:

I’m dubious Thiel is actually an ally to anyone worried about permanent dictatorship. He has connections to openly anti-democratic neoreactionaries like Curtis Yarvin, he quotes Nazi lawyer and democracy critic Carl Schmitt on how moments of greatness in politics are when you see your enemy as an enemy, and one of the most famous things he ever said is “I no longer believe that freedom and democracy are compatible”. Rather I think he is using “totalitarian” to refer to any situation where the government is less economically libertarian than he would like, or “woke” ideas are popular amongst elite tastemakers, even if the polity this is all occurring in is clearly a liberal democracy, not a totalitarian state.

Note this commenter still uses non-confrontational language (“I’m dubious”) even when directly calling Thiel out.

The top comment, though, is just like the main post, extending charitability to complete technofascist insanity. (Warning for psychic damage)

Nice post! I am a pretty close follower of the Thiel Cinematic Universe (ie his various interviews, essays, etc)

I think Thiel is also personally quite motivated (understandably) by wanting to avoid death. This obviously relates to a kind of accelerationist take on AI that sets him against EA, but again, there’s a deeper philosophical difference here. Classic Yudkowsky essays (and a memorable Bostrom short story, video adaptation here) share this strident anti-death, pro-medical-progress attitude (cryonics, etc), as do some philanthropists like Vitalik Buterin. But these days, you don’t hear so much about “FDA delenda est” or anti-aging research from effective altruism. Perhaps there are valid reasons for this (low tractability, perhaps). But some of the arguments given by EAs against aging’s importance are a little weak, IMO (more on this later) – in Thiel’s view, maybe suspiciously weak. This is a weird thing to say, but I think to Thiel, EA looks like a fundamentally statist / fascist ideology, insofar as it is seeking to place the state in a position of central importance, with human individuality / agency / consciousness pushed aside.

As for my personal take on Thiel’s views – I’m often disappointed at the sloppiness (blunt-ness? or low-decoupling-ness?) of his criticisms, which attack the EA for having a problematic “vibe” and political alignment, but without digging into any specific technical points of disagreement. But I do think some of his higher-level, vibe-based critiques have a point.

scruiser@awful.systems · 6 days ago

This was discussed last week but I looked at the comments and noticed someone in the comments getting slammed for… checks notes… noting that Eliezer wasn’t clear on what research paper he was actually responding to (multiple other comments are kind of confused, because they assume he means one paper then other comments correct them that he obviously meant another). The commenter of course edits to back-peddle.

scruiser@awful.systems · 14 days ago

that couple

I hate that I know what is being talked about the instant I see it.

Also, they’ve appeared on 3 separate top posts in the stubstack this week, so yeah another PR blitz. I find it kind of funny/stupid the news media can’t even bother to find a local eugenicist couple to talk to. I guess having a “story” served up to you is enticing enough to utterly fail to provide pushback or question if the story is even relevant to your audience in the first place.

scruiser@awful.systems · 15 days ago

They are going with the 50% success rate because the “time horizons” for something remotely reasonable like 99% or even just 95% are still so tiny they can’t extrapolate a trend out of it and it tears a massive hole in their whole AGI agents soon scenarios().

scruiser@awful.systems · edit-2 15 days ago

I would give it credit for being better than the absolutely worthless approach of “scoring well on a bunch of multiple choice question tests”. And it is possibly vaguely relevant for the ~~pipe-dream~~ end goal of outright replacing programmers. But overall, yeah, it is really arbitrary.

Also, given how programming is perceived as one of the more in-demand “potential” killer-apps for LLMs and how it is also one of the applications it is relatively easy to churn out and verify synthetic training data for (write really precise detailed test cases, then you can automatically verify attempted solutions and synthetic data), even if LLMs are genuinely improving at programming it likely doesn’t indicate general improvement in capabilities.