All of our memories are stored in photos and videos and other forms of digital data in the present day. Globally, every minute 3.8 million searches are handled by Google, 4.33 million videos are watched on YouTube, 159 million emails are sent, and 49 thousand photos are posted on Instagram. This has indeed brought digital footprint in active discussions. We have collectively generated more data in the last few years of digitalization than in all of our preceding history. It is estimated that 2020 onwards, 418 zettabytes of data will be created by a global population of 7.8 billion per year. Moreover, there is an illusion that the Cloud has indefinitely solved the problem of data storage. After all, it’s only a huge collection of hard drives. The storage systems that currently hold this volume of data are expected to exhaust in a century’s time, if not less. This will give rise to a severe data storage problem for future generations.
Fun Fact: In 1956, IBM announced the first ever hard drive which weighed about one ton, but had a storage capacity of only 3.75MB- the average size of an MP3 song!
Storage devices have evolved over the years as technology advanced. Devices with larger storage capacities and smaller sizes have been developed and evolution is still in progress. We have indeed come a long way, from floppies to CDs to pen drives and microchips. As it seems, all devices eventually become obsolete and are replaced by more sophisticated forms of storage devices.
What is the alternative form of data storage that can address the shortcomings of the prevailing devices?
DNA is Nature’s oldest storage device. After all, it has accurately stored information that helps to produce and maintain life forms, for billions of years. Data can be stored in artificially synthesized sequences of DNA bases A, T, C and G, turning DNA into a new form of information technology. DNA is extremely durable because scientists have been able to recover DNA from hundreds of thousands of years ago. Not surprising now that we have a better chance of recovering data from an ancient human, than we have from an old phone!
Despite its unmatched stability, it is actually the incredible storage capacity of DNA that makes it a star candidate for data storage. DNA can accurately store massive amounts of data in a density far exceeding that of any storage device up until today. All the movies that the world has produced till date can be stored in one Eppendorf! Besides these qualities, we can be sure that as long as human life exists, DNA as a data storage medium will not be obsolete unlike other storage devices.
How do we encode and decode digital data in DNA?
Every single new storage format requires a new way to read it. For DNA, we can easily read it by sequencing. We have the ability to read, write and copy (in technical words, sequence, synthesize and amplify) DNA. Any information that can be stored as 0s and 1s can be stored in DNA as well.
We already know that texts, images, sounds and video clips are all stored as binary codes in our devices. Each pixel in this black and white photo is stored as a 1 or a 0. The 4 different nucleic acid bases A, T, G and C can be represented in 2 bits. Therefore, we can map 00 as A, 01 as T, 10 as G and 11 as C. We convert our binary data into a sequence of As, Ts, Gs and Cs and send this ‘meaningful’ DNA for synthesis. Once synthesized, we can store the DNA for years at -80°C and amplify the sequence using PCR as and when desired to make multiple copies of our data. Yes, all your data can be stored in tiny tubes of DNA! To restore data, the DNA needs to be sequenced, which is a common and easy process these days. To decode the sequence of DNA bases, all the As in the sequence are replaced with 00, Ts with 01 and so on, to get our machine readable binary data structure and voila! Your data has been recovered!
As appealing as this this new “technology” sounds, it does have its own disadvantages, unfortunately. Although DNA is more robust than any man-made device, once we retrieve the data from the DNA, we lose the DNA as a part of the sequencing (decoding) process. Luckily copying DNA is cheaper and easier than synthesizing DNA. By making multiple copies of the data stored DNA, our data will never be lost.
Secondly, synthesizing DNA is error prone, though the chance of encountering an error is too little. Nature has a way to deal with erroneous sequences within our cells, but our data is stored in synthesized DNA in tubes. The key is to recover as many 0s and 1s correctly as possible from the DNA base sequence while decoding.
Fun Fact: ‘L’Arrivée d’un train en gare de La Ciotat’ made in 1896, is the first movie to be copied 200 trillion times on DNA.
Writing and reading data stored in DNA is obviously way more time consuming than storing information in a hard drive, at least for the time being. Therefore the use of DNA based storage should be limited to long-term storage of important information.
Lastly, encoding and decoding bits as DNA bases is an expensive process. The costs need to drop even further if this technology has to compete with electronic storage devices. One thing that we certainly know is that even if DNA cannot become a ubiquitous storage material in the near future, it will certainly be used for generating and storing certain types of data over the longer term.
In 2017, a research group at Harvard University used CRISPR DNA editing technique to store the image of a human hand in the genome of E. coli. Microsoft and Twist Bioscience are working to improve DNA based data storage technology. University of Washington has developed an automated system, in partnership with Microsoft, to write, store and read DNA based data.
The recent advancements in the next-generation sequencing technologies have opened the scope for using DNA sequences as bar codes. Currently, the applications of these molecular identification ‘tags’ are limited to tracking experimental data only.
Just like IoT, in 2019, the concept of DNA of things was introduced by a group from Israel and Switzerland, to encode digital data in DNA molecules which can in turn be embedded in devices. However, unlike IoT (interrelated storage devices), DoT can be used to create independent storage devices.
In conclusion, DNA based data storage is not far anymore and it can handle massive amounts of information in very limited space. Once established, this technology will be far from getting redundant or obsolete since, as long as humans prevail, we will surely keep finding cheaper and faster ways to sequence DNA.
I have completed B.Sc. Microbiology (Hons.) at St. Xavier’s College, Kolkata. Now I manage my independent blog called ‘The Bio Bee’ where I write mostly about the interesting developments in the field of Biosciences and create awareness about environmental issues. I spend most of my time-solving sudokus, puzzles and riddles, and I also love reading fiction books.
Other blogs of the author can be found here.
This blog marks the second collaboration between The Qrius Rhino and The Bio Bee. The Bio Bee is a Science blog like us, that is run by Anuja Bothra, the author of this article. She writes on various interesting and thought-provoking topics and we are proud to introduce her to our readers through her unique content. You should visit her blog page, The Bio Bee. Also, do follow the Instagram page for updates regarding the latest posts. Other collaboratively published blogs can be found here.