Writer: Rishabh Shukla*
Some 50 years from now, optical, magnetic and flash drives will become obsolete and deoxyribonucleic acid (DNA) will be used for large-scale data storage. In July 2016, a team from Microsoft Research and the University of Washington along with Twist Bioscience, a San Francisco start-up, reached a milestone by successfully storing 200 MB of digital data in DNA. The size of this synthesized DNA containing data was comparable with the tip of a pencil.
But, there are few bottlenecks too, semiconductor memories read and write data in microseconds and they are very economical, on the other hand, encoding and decoding data in DNA is a complex task, it requires more time and money.
DNA possesses some of the attractive properties important for storing data, firstly, it is very stable; synthetic DNA can remain intact for thousands of years. Secondly, DNA is never going to become obsolete as it holds blueprint of the living system. Thirdly, it has high packing density — 1 kg of DNA is enough to store all the data available in the world.
DNA consists of nucleotides, and each nucleotide, in turn, contains a phosphate group, a sugar group, and a nitrogen base. There are four nitrogen bases, namely (A)denine, (T)hymine, (G)uanine and (C)ytosine. The sequence of base is a kind of genetic code that is passed from parents to children. Oligonucleotides are short DNA molecules, these small bits of nucleic acids can be synthesized in the laboratory as single strand molecule with any user specified sequence. This fact is used by engineers and biologist to store information.
The concept of storing data is not new to DNA. In fact, much before the advent of semiconductors, DNA has been carrying genetic data for generations. The only difference is in the format of data. DNA carries data in form of sequence of nitrogen base pair, for example, GATCAG, whereas semiconductors carry data in form of binary digits, for example, 11010.
Let's understand the mechanism. Suppose we wish to store an image in DNA. The image is broken down into pixels. The brightness value of each pixel, available in form of binary number is uniquely mapped to nitrogen base pair sequence, for example, 11010 is mapped to GATCAG. Once the complete DNA map is ready, DNA can be artificially synthesized in a laboratory. This process is analogous to writing data in DVD. Once synthesized, DNA can be stored in test tubes for hundreds of years. When we wish to retrieve the data we just have to read the synthesized DNA using a DNA sequencing machine. This process will generate the exact sequence of base pair, which can be translated back into binary data and, in turn, the image can be regenerated.
But, there are few bottlenecks too, semiconductor memories read and write data in microseconds and they are very economical, on the other hand, encoding and decoding data in DNA is a complex task, it requires more time and money. In 2013, researchers at European Bioinformatics Institute (EBI), Hinxton, UK estimated the cost of encoding and decoding data in DNA as $12,400 per MB and $220 per MB respectively. Of course, the cost is high when compared with conventional semiconductor memories, but technology is rapidly advancing and cost of DNA synthesis is falling.
About 2.5 quintillion bytes of data are generated every day. While storing data in digital format is easy, data archival is a complex task requiring continuing maintenance and regular transferring between storage media. Ideally, DNA provides an alternative to conventional semiconductor for secure and long-term data storage.
(Rishabh Shukla is a freelance writer based in Kolkata.)