LLMs Can Think While Idle: Researchers from Letta and UC Berkeley Introduce ‘Sleep-Time Compute’ to Slash Inference Costs and Boost Accuracy Without Sacrificing Latency

cm0002@lemmy.world · 1 month ago

jwmgregory@lemmy.dbzer0.com · 22 days ago

i mean…

your brain essentially does this it’s just that compute and memory are one system and it is as physically optimized as possible in brain systems.

this strategy is less stupid than it sounds if you abandon von neumann purism imo

it can be more efficient than just waiting for the input and inferring once based on that… you are an example of this in real life.

vrighter@discuss.tchncs.de · 22 days ago

faster != more efficient. And you cannot compare brains to computers. Speculative execution improves speed in the cpu, at the cost of efficiency