Team creates better artificial vision system

Researchers from Harvard and MIT have demonstrated a way to build better artificial visual systems with the help of low-cost, high-performance gaming hardware. They don’t say which gaming hardware is being used for their research.

The neural processing involved in visually recognizing even the simplest object is incredibly hard to mimic.

“Reverse engineering a biological visual system — a system with hundreds of millions of processing units — and building an artificial system that works the same way is a daunting task,” says David Cox, Principal Investigator of the Visual Neuroscience Group at the Rowland Institute at Harvard. “It is not enough to simply assemble together a huge amount of computing power. We have to figure out how to put all the parts together so that they can do what our brains can do.”

The team drew inspiration from screening techniques in molecular biology, where a multitude of candidate organisms or compounds are screened in parallel to find those that have a particular property of interest. Rather than building a single model and seeing how well it could recognize visual objects, the team constructed thousands of candidate models, and screened for those that performed best on an object recognition task.

The resulting models outperformed a crop of state-of-the-art computer vision systems across a range of test sets, more accurately identifying a range of objects on random natural backgrounds with variation in position, scale, and rotation.

Using ordinary CPUs, the effort would have required either years or millions of dollars of computing hardware. Instead, by harnessing modern graphics hardware, the analysis was done in just one week, and at a small fraction of the cost.

“GPUs (graphics processor units) are a real game-changer for scientific computing. We made a powerful parallel computing system from cheap, readily available off-the-shelf components, delivering over hundred-fold speed-ups relative to conventional methods,” says researcher Nicholas Pinto. “With this expanded computational power, we can discover new vision models that traditional methods miss.”

The technique could be applied to other areas of computer vision, such as face identification, object tracking, pedestrian detection for automotive applications, and gesture and action recognition.

The research appears in PLoS Computational Biology.