Turning a photo or sketch of an object into a fully realized 3D model so a 3D printer can duplicate it, and played in a video game, requires the digital modeler skills working from a stack of images. To generate fully textured 3D models based on just a single photo Nvidia has successfully trained a neural network.
We have already seen similar approaches to automatically generating 3D models. They required input from a human user to help the software figure out the shape and dimensions of a specific object in an image or photo series snapped from many different angles for the accurate results. But both are wrong approaches to the problem. Nevertheless, any improvements or methods made to the task of 3D modeling are more than welcome. They make such tools available to a broader audience, even if they lack advanced skills. But they limit the potential uses for the software.
This week in Vancouver, British Columbia, the annual Conference on Neural Information Processing systems will take place. Researchers from Nvidia will present a new paper there. The paper’s name is “Learning to Predict 3D Objects with an Interpolation-Based Renderer”. The article will involve details of the creation of a new graphics tool called a differentiable interpolation-based renderer. For short, it is DIB-R, which sounds less intimidating.
The researchers of Nvidia trained DIB-R neural network on multiple datasets. The training included 3D models presented from various angles. Sets of photos that focused on a particular subject from multiple angles and pictures previously turned into 3D models.
2D Images Into 3D Models
To train the neural network on hot to extrapolate the extra dimensions of given subjects, for example, a bird takes roughly two days. Once it is complete, it’s able to churn out a 3D model based on a 2D photo, and it’s never been analyzed before in less than 100 milliseconds.
This impressive processing speed is what makes this tool particularly interesting. It has a significant potential to improve how machines like autonomous cars, or robots, see the world, and to understand what lies before them. The still images pulled from a live video stream from a camera can be instantaneously converted into 3D models. It will allow, for example, an autonomous car to accurately gauge the size of a large truck it needs to avoid. The robot will be able to predict how to pick up any object based on its estimated shape accurately. Even the performance of security cameras can improve with the help of DIB-R. It can identify people and track them. As soon as a person moves through its field of view, and instantly generated 3D model would make it easier to perform image matches.
Many machine learning models operate on images but ignore the 3D geometry from 2D projections interacting with light, in a rendering process.
The research paper mentioned above involved the NVIDIA Research team, which consisted of 200 scientists around the globe. Their areas of focus are computer vision, self-driving cars, graphics and photos, and of course, artificial intelligence.