Physical AI’s Trillion Dollar Data Problem
When people talk about why language models worked, they usually point to scale: bigger models, more compute, better optimizers. The truth is that the internet did most of the work in creating that scale. By the time GPT-3 arrived, decades of human writing had already been digitized, indexed, and made cheaply available. The training data wasn’t free, but it existed.
