Watch the Low-Code/No-Code Summit on-demand sessions to learn how to successfully innovate and achieve efficiencies by upskilling and scaling citizen developers. Watch now.
Artificial intelligence (AI) may be eating the world as we know it, but experts say AI is starving itself — and needs to change its diet. One company says synthetic data is the answer.
“Data is food for AI, but AI is malnourished and malnourished today,” said Kevin McNamara, CEO and co-founder of the synthetic data platform provider. parallel domain, which only raised $30 million in a Series B round led by March Capital. “That’s why it’s growing slowly. But if we can feed that AI better, models will grow faster and in a healthier way. Synthetic data is like food for training AI.”
Research has shown that approx 90% of AI and machine learning (ML) implementations fail. A Data gene report from earlier this year pointed out that many failures are due to a lack of training data. It turned out that 99% of computer vision professionals say they abandoned an ML project specifically because of the lack of data to push it through. Even the projects that are not completely canceled due to lack of data are experiencing significant delays, causing them to go off track, 100% of respondents reported.
In that vein, Gartner predicts that synthetic data will increasingly be used to supplement AI and ML training purposes. The research giant projects that by 2024 synthetic data will be used to accelerate 60% of AI projects.
Synthetic data is generated by machine learning algorithms that ingest real data to train behavior patterns and create simulated data that preserves the statistical properties of the original data set. The resulting data replicates real-world conditions, but unlike standard anonymized data sets, they are not vulnerable to the same flaws as real data.
Bringing AI out of the ‘Stone Age’
It might sound unusual to hear that a technology as advanced as AI is stuck in a “stone age” of sorts, but that’s what McNamara sees — and without synthetic data adoption, it will remain that way, he says.
“At the moment, AI development is a bit like computer programming was in the ’60s or ’70s when people used punched card programming — a manual, labor-intensive process,” he said. “Well, the world eventually moved away from this and into digital programming. We want to do that for the development of AI.”
According to McNamara, the three biggest bottlenecks keeping AI in the Stone Age are:
- Collect real world data – which is not always feasible. Even for something like jaywalking, which is quite common in cities around the world, if you need millions of samples to train your algorithm, that quickly becomes out of reach for companies to leave the real world.
- Labelling – which often takes thousands of hours of human time and can be inaccurate because people make mistakes.
- Iterate on the data once it’s labeled – which requires you to adjust sensor configurations etc and then apply it to actually start training your AI.
“That whole process is so slow,” McNamara said. “If you can change those things really quickly, you can discover better settings and better ways to develop your AI in the first place.”
Step right in: synthetic data
Parallel Domain works by generating virtual worlds from maps, which it calls “digital cousins” of real-world scenarios and geographies. These worlds can be changed and manipulated to have more jaywalking or rain, for example, to help train autonomous vehicles.
Because the worlds are digital relatives, not digital twins, customization can simulate the sometimes harder-to-obtain — but essential for training — data that companies would normally have to go get themselves. The platform allows users to customize it to their needs via an API, allowing them to move or manipulate factors exactly as they please. This speeds up the AI training process and removes obstacles from time and work.
The company claims it can deliver ready-to-use training datasets within hours to its customers — customers including the Toyota Research Institute, Google, Continental and Woven Planet.
“Clients can go into the simulated world and make things happen or pull data from that world,” McNamara said. “We have buttons for different types of asset classes and scenarios that can happen, as well as ways for customers to plug in their own logic for what they see, where they see it, and how those things behave.”
Then customers need a way to pull data from that world into the configuration that fits their configuration, he explained.
“With our sensor configuration tools and label configuration tools, we can replicate the exact camera settings or the exact lidar and radar and label settings that a customer would see,” he said.
Synthetic data, generative AI
Not only is synthetic data useful for AI and ML model training, it can also be applied to accelerate generative AI – an already growing use of the technology – even faster.
Parallel Domain looks to the field as the company enters 2023 with fresh capital. It hopes to multiply the data generative AI needs to train so it can become an even more powerful tool for content creation. The R&D team focuses on the variety and detail in the synthetic data simulations it can provide.
“I’m excited about generative AI in our space,” McNamara said. “We are not here to create an artistic interpretation of the world. We are here to actually create a digital cousin of the world. I think generative AI is very powerful in looking at sample images from around the world, then pulling those in and creating interesting examples and new information in synthetic data. Therefore, generative AI will be a big part of the technological advancements we invest in in the coming year.”
The value of synthetic data is not limited to AI. Also, given the sheer amount of data required to create realistic virtual environments, it’s the only practical approach to moving the metaverse forward.
Parallel Domain is part of the fast-growing synthetic data startup sector, which Crunchbase previously reported will see a streak of funding. Datagen, Gretel AI and Mostly AI are some of its competitors that also raised several million in the past year.
VentureBeat’s mission is to become a digital city plaza where tech decision makers can learn about transformative business technology and execute transactions. Discover our Briefings.