To demonstrate that DNA’s suitable for use as a large-scale data storage medium, a team of scientists has encoded a 5.27MB book using DNA microchips, and then read it back again.
The book – Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves in DNA – is written by genetics professor George Church of Harvard’s Wyss Institute.
It’s the largest amount of data ever to be encoded in this way – 1,000 times as much as ever before – and involves a data density of 1 million gigabits per cubic millimeter.
“The information density and scale compare favorably with other experimental storage methods from biology and physics,” says team member Sriram Kosuri.
About four grams of DNA theoretically could store the digital data humankind creates in one year.
Although other projects have encoded data in the DNA of living bacteria, the Wyss team used commercial DNA microchips to create standalone DNA.
“We purposefully avoided living cells,” says Church.
“In an organism, your message is a tiny fraction of the whole cell, so there’s a lot of wasted space. But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn’t earn its keep, if it isn’t evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it.”
The team also rejected so-called ‘shotgun sequencing’, in which long DNA sequences are reassembled by identifying overlaps in short strands. Instead, they encoded the book in 96-bit data blocks, each with a 19-bit address to guide reassembly.
Including jpeg images and HTML formatting, the code for the book required 54,898 of these data blocks, each a unique DNA sequence.
The team discussed including a DNA copy with each print edition of Regenesis – but decided this was a bit risky. “Maybe the next book,” says Church.