Parallel database is the bee’s whiskers

MIT researcher Todd Mostak has invented a new parallel database that allows for crunching complex spatial and GIS data in milliseconds.

Dubbed MapD, the database uses off-the-shelf gaming GPUs in the same way that you would use a rack of mini supercomputers. Mostak reports performance gains upwards of 70 times faster than CPU-based systems.

According to Data Informed, it all started when he was at the Center of Middle Eastern Studies at Harvard in 2012 and trying to map tweets for his thesis project on Egyptian politics during the Arab Spring uprising.

He found that it was taking hours or even days to process the 40 million tweets he was analysing. While he saw the value of geolocated tweets for socio-economic research, he did not have access to a system that would allow him to map the large dataset quickly for interactive analysis.

MapD will be released to the great unwashed under an open source business model similar to 10gen and its company MongoDB.

Mostak said that while people had written little research pieces about algorithms no one had tried to build an end-to-end system.

What was strange for MIT was that Mostak was not really a techie and had no background in computer science.

Mostak wanted to test the theory that poorer neighbourhoods in Egypt are more likely to be Islamist. He looked at geocoded tweets from around Cairo during the Arab Spring upraising. He examined if the tweet writer followed known Islamist politicians or clerics.

He cross-referenced the language in the tweets with forums and message boards he knew to be Islamist to measure sentiment. He also checked the time stamps to see if Twitter activity stopped during the five daily prayers.

He then plotted the Islamist indicators from 40 million tweets, ranging from August 2011 through March 2012, against 5,000 political districts from the Egyptian census.

The system was based around a $200, mid-level consumer graphics card, with two GeForce Titan GPUs made by Nvidia.

It was able to crunch data at the same speed of the world’s fastest supercomputer in the year 2000 and cost $5,000 to build. He said that it uses SQL queries to access the data, and with its brute force GPU approach, it could work not only geographic and mapping applications but machine learning, trend detection and analytics for graph databases.