news

DatologyAI raises $46M to streamline AI model training data diets – Business

Spread the love


Artificial intelligence data curation startup DatologyAI said today it has closed on a $46 million early-stage round of funding, which comes just three months after it first announced it had raised $11.65 million in seed funding.

The company said today’s Series A round was led by Viv Faga and Astasia Myers from Felicis Ventures, and saw the participation of existing investors including Radical Ventures and Amplify Partners, plus new investors such as Elad Gil, M12 and the Amazon Alexa Fund. All told, DatologyAI has now raised nearly $57.7 million in funding.

According to the startup, it’s aiming to democratize data research to try to solve one of the major headaches of generative AI development: the need to curate extremely large and appropriate datasets that inform large language models such as OpenAI’s GPT-4 and Google LLC’s Gemini Pro.

DatologyAI provides tools that can help to automate much of the work involved in creating these datasets. It works by identifying which information within a dataset is the most appropriate, based on what the AI model is designed to do. In addition, its tools can suggest ways to augment existing datasets with additional information, work out the best way to batch that information, or split it into more manageable chunks to streamline the model training process.

The startup says it’s challenging to create datasets for generative AI, because developers need to be careful that their models don’t start spewing out toxic content or showing biases that are a direct result of the content they’re trained on. The problem is that prejudicial patterns can exist in the data that are difficult for humans to spot. One reason for this is that AI training datasets tend to be enormous and complex, with various different formats and tons of noise and unnecessary information that won’t really improve the model.

“Models are what they eat, and the data models ingest determines everything about their capabilities,” the company explained in a short blog post announcing today’s round.

Founder and Chief Executive Ari Morcos says that by using more efficient training datasets, it’s possible to improve the quality and performance of AI models without making them excessively large and expensive to train and run, Morcos believes.

Smaller AI models have much lower compute costs, and that’s a key consideration because some AI companies are spending millions of dollars each month on training and running their models.

The challenge for AI developers is that they often have so much information that they don’t know where to begin, and rather than attempt to work it out, they simply select a subset of the available data at random. Doing this might save on time and hassle, but it also inevitably means the model is being trained on redundant data, resulting in slower training times and higher costs — not to mention the fact that some of this data may affect the model’s performance.

DatologyAI provides tools that enable developers to identify the most useful information within a given dataset. The less-useful information is then filtered out, creating a much more streamlined file with higher-quality samples that’s ready for training.

The company’s toolset can also help in labeling unlabeled data, which is a painstaking job that’s normally done manually. Finally, it’s capable of identifying any data that might be harmful or cause the model to behave in unexpected ways.

The startup said today’s funding round will enable it to “substantially scale the size of our team,” with a particular focus on adding more researchers and engineers to its staff. It also wants to increase its compute output in order to “push the frontier of what is possible with data curation.”

Image: SiliconANGLE/Microsoft Designer

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU