By Dr Pierre Meulien
The sequencing of human, animal and plant DNA is happening more quickly and in more incredible volumes than we ever could have imagined, producing some amazing comparisons. For example, as Compute Canada recently noted, DNA sequencing machines "will be capable of producing 85 petabytes of data this year worldwide" — roughly 33 times the data storage taken up by all the movies on Netflix.
This ocean of data has incredible potential for breakthroughs in human health, agriculture, forestry management, bioenergy, aquaculture and other sectors. In many cases, turning that potential into reality requires massive collaborations spanning jurisdictions within and across nations.
The International Cancer Genome Project, for instance, has 85 teams in 17 countries now studying over 25,000 tumour genomes across 50 different cancer types. Their findings will help researchers develop personalized cancer treatments for use world-wide. The Global Microbial Identifier project is building a global system of DNA databases for microbes implicated in infectious diseases in order to shape better responses to new outbreaks of disease.
The success of such projects depends on finding ways for trillions of bits of data to be stored, read meaningfully and shared among researchers within Canada and around the globe. Three key elements must be in place: high-performance computing to store and analyze the data; sophisticated software tools to interpret the data in ways that researchers can use; and the digital as well as institutional networks enabling researchers to collaborate in sharing and using data.
Today, the oceans of DNA data produced by Canadian genomics researchers are running into a bottleneck on the digital and informatics fronts. While technologies for sequencing DNA are advanced, technologies for storing and analyzing the information produced are far less developed. Mining, accessing, sharing and analyzing these vast quantities of genomic information pose a major challenge for the research community — a challenge that Canada is well placed to address.
Canada is not alone in facing these problems. Across the globe, all organizations involved in big data research are struggling with the challenges of gathering, storing, interpreting and sharing immense amounts of information. These challenges include where to store the data (separate data warehouses or in a shared repository in the cloud) and how to ensure that researchers in many countries can join forces in data analysis.
Budget 2015 specifically addresses the need for a national Digital Research Infrastructure strategy. This is an urgently necessary step if we wish to ensure that Canada is competitively equipped to join in solving these global problems. Continued national investment is needed to build world-class capacities in data analysis, data interpretation and data sharing. We need better infrastructure both in high-performance computing, large-scale storage as well as software so that researchers can interpret genomic data and share it through networks.
Investment is also needed in institutional coordination, to ensure that high-performance computing is harmonized rather than trapped in silos. Training programs are also needed to bring computational scientists, mathematicians and biologists together to produce software capable of analyzing data sets tailored to genomic researchers' needs.
In some respects, a bottom-up approach will be required to reach these goals, particularly to produce sophisticated algorithms for researchers' use. In many other respects, a top-down focus is called for to get all of Canada's provincial and national-level funders and institutions to collaborate in getting the right infrastructure in place and ensuring the pieces fit together productively.
From treating disease and stopping pandemics to managing forests and raising crop yields, Canadian genomics researchers are turning bytes into quality-of-life breakthroughs. Fostering our capacity to engage in cutting-edge global research demands continued investments in world-class infrastructure.
Dr Pierre Meulien is the former president and CEO of Genome Canada. (see page 7)