We were days away from IPO. We had raised $100 million in funding and exploded from a team of 50 in a garage to 600 in 18 months. One million technologists joined our platform. We were the next big deal.
It was the turn of a new millennium. And anything was possible. We were betting on a big deal called the internet.
Fast-forward to today. AI is the new internet. Cloud is currency. And “data is the new dollar.”[1]
By the end of this decade, economic output from AI is poised to eclipse the entire economies of China and India combined—nearly $16 trillion.
And yet, to CEOs and technology leaders, the promise of AI can feel at times both overwhelming and underwhelming. Underwhelming given the track record of many AI projects. Overwhelming because leaders are drowning in a sea of data. The sea is deep. And it’s roaring with noise.
A massive paradigm shift is accelerating among artificial intelligence gurus: to solve precision problems, a tiny, precise dataset (as small as 50 images) beats millions of images of noisy data.
All AI is custom. The challenge is, how do you make it systematic and scalable? The answer: good data, not big data.
Andrew Ng, co-founder of Google Brain and named to the Time 100 most influential people, really nailed it at the ScaleUp:AI conference. At its most basic level, AI is simply data + code. The problem is, nearly every company out there is scrambling to fix the code. Engineers are working furiously to write better algorithms. Better algorithms can bring incremental improvement. And yet if you want exponential, scalable gains, you need to focus on the data.
Perhaps most astonishing was Andrew’s admission that even though he’s compiled one of the largest datasets in the world (over 350 million facial images), this massive amalgamation of data is actually almost a liability when it comes to accurate problem-solving.
Because if you want to answer a highly specific question or address a specialized business challenge (say, for example, identify a defect in a part on a manufacturing floor), all the millions of extraneous data points create noise, which is counterproductive to the effectiveness of any AI.
If you want to solve a business problem in a meaningful way, in a scalable way, start with a data-centric approach.
And what Andrew found was fascinating: In interviewing the top manufacturing companies in the world, he discovered that in over half the cases of computer vision projects, defect detection could be solved with fewer than 50 images. And that in nearly 90% of business cases, no more than 200 images were required.
To wit, Ng revealed that in a steel manufacturing operation, despite 18 months of focused work on the model algorithms, zero improvement was realized in accuracy rates. Meanwhile, shifting the model to a data-centric approach, the quality rate increased from 76.2% to 93.1%. And it happened in a matter of weeks.
When my team first started working with Daimler two years ago, we found a remarkably similar phenomenon at play.
As VentureBeat reported, with a data-centric approach, using only a few dozen images, in 90 days and on a six-figure budget, Daimler built a synthetic data prototype so accurate that it surprised even the company’s most senior data scientists.
How do leaders achieve comparable results from their rapid innovation?
- Identify your biggest “unsolvable” challenge. Pick one problem that, if solved, will radically change your profits and/or create a long-term competitive advantage.
- Identify a small data window for a proof of concept. You’ll be amazed at what you can achieve with two weeks of data. Keep your POC short: no more than three to six months.
- Rinse and repeat and/or build it out for production.
It really is about good data, not big data. The sea is deep. And yet, a miracle may be waiting just offshore.
You may be only days away—if you cut through the noise.
David Yunger is CEO of AI and software development firm Vaital.
[1] Rokeya Jones, Innovations in Testing Conference: Closing General Session, Orlando, FL, 3/23/2022