For companies that use ML, labeled data is the key differentiator

AI is driving the paradigm shift that is the software industry’s transition to data-centric programming from writing logical statements. Data is now oxygen. The more training data a company gathers, the brighter will its AI-powered products burn.

Why is Tesla so far ahead with advanced driver assistance systems (ADAS)? Because no one else has collected as much information — it has data on more than ten billion driven miles, helping it pull ahead of competition like Waymo, which has only about 20 million miles. But any company that is considering using machine learning (ML) cannot overlook one technical choice: supervised or unsupervised learning.

There is a fundamental difference between the two. For unsupervised learning, the process is fairly straightforward: The acquired data is directly fed to the models, and if all goes well, it will identify patterns.

Elon Musk compares unsupervised learning to the human brain, which gets raw data from the six senses and makes sense of it. He recently shared that making unsupervised learning work for ADAS is a major challenge that hasn’t been solved yet.

A major part of real-world AI has to be solved to make unsupervised, generalized full self-driving work, as the entire road system is designed for biological neural nets with optical imagers

— Elon Musk (@elonmusk) April 29, 2021

Supervised learning is currently the most practical approach for most ML challenges. O’Reilly’s 2021 report on AI Adoption in the Enterprise found that 82% of surveyed companies use supervised learning, while only 58% use unsupervised learning. Gartner predicts that through 2022, supervised learning will remain favored by enterprises, arguing that “most of the current economic value gained from ML is based on supervised learning use cases”.