last updated 9 July 2024, San Francisco
FAQ is assembled based on investor interviews, and is constantly updated as we progress. Please come back from time to time or send an email to [email protected] if you have further questions!
You may find topics, like what do we do, how do we do it, about market, competition, why will DiffuseDrive win in this market, defensibility, customer journey and technical topics, like model collapse.
Firstly, it's important to clarify that when people think of generative AI, there is more than just products like GPT.
GPT-4 is a textual model (and GPT-5 will probably be a textual one as well), while we at DiffuseDrive specialize in vision data — images, and later, videos. So instead of comparing ourselves to GPT-5, a more relevant question would be: what sets us apart from models like DALL-E and SORA (keeping within the OpenAI context)?
<aside> đź’ˇ We are using and building on top of diffusion models, hence the name DiffuseDrive.
</aside>
To answer this concisely: while general pre-trained models like DALL-E, SORA, Stable Diffusion, etc., serve as foundational building blocks of our proprietary process, we differentiate ourselves by utilizing and customizing them for hardware-specific computer vision vision data - as if the imagery would come out of the customer’s camera - with exact same camera characteristics.
The base/general models excel in generating visually appealing data, ideal for marketing, design, and artistic purposes. However, they fall short when it comes to producing data tailored to the safety-relevant applications and requirements of computer vision applications.
<aside> 💡 Let us illustrate this further: Imagine you're developing AI software for an autonomous drone’s recognition system. The drone has a camera installed on it, and your goal is to train the AI using data that closely resembles what the camera would capture in real-world scenarios. This means the training data needs to match the camera's position, angle, and characteristics to minimize the data post-processing, making sure that the incoming data is instantly plug-able to the already existing machine learning pipelines
</aside>
In essence, we specialize in fine-tuning general pre-trained models to adapt them to our customers' specific use cases (domain-specific and customer-specific data). These models serve as foundational elements that we heavily modify to meet the unique application-critical needs of our clients. Therefore, they are not competitors but rather essential components that we leverage and embrace to deliver tailored solutions.
A visual data acquisition solution, a developer tool for all computer vision development
<aside> đź’ˇ horizontally ranging from cars, through drones, manufacturing, warehousing, logistics, defense, industrial and consumer robotics to security, healthcare, monitoring, diagnostics and agriculture.
</aside>
Today, getting the data means manual collection (driving the car, flying the drone, using the robot itself) and using rendering based simulations. It’s not the developer who uses the robot, nor the one using the simulation, hence there is always friction and resistance of having a human present in the data acquisition:
we give the control over the data acquisition to the hands of the developers, they now can easily interact with our tool with natural language and get what they want without any middleman.
Instead of outsourcing (manual collection or simulation) and human in the loop, the control goes to the developer. We designed our system so that the developers get access to generative model, that fulfills their needs (high-quality, high-quantity and photorealistic imagery about their “world”) with the least friction possible.
<aside> đź’ˇ Analogy: The bakers (developers) know the receipt (machine learning model), they know how to make it (machine learning pipeline), and now they get to grow their ingredients (data for vision AI development).
</aside>