The amount of data produced is exploding. It is estimated that 2.5 quintillion (1 followed by 18 zeros) bytes of data are created every day. The volume of data is growing so quickly that 90 percent of the world's data has been produced in the last two years. This explosion of data is also occurring in all areas of biomedical research. A single human genome sequence contains roughly six billion base pairs. A single research study may require analyzing the genome sequences of tens of thousands of patients. Processing and managing these data are at the forefront of modern science, including the capture, curation, storage, searching, sharing, transferring, and analysis of these huge data sets. New approaches will help to expand the impact of all of the informatics technologies on health and disease.
Classically, scientific progress has been anchored on two pillars - Theory and Experimentation. Recently the Big Data revolution has hit science as the sheer volume of scientific data increases exponentially. Advances in scientific computing technology, together with Big Data, have created a third pillar - Computation. Data Science brings together these three pillars to accelerate discoveries. Recently, the Harvard Business Review declared that the data scientist is the "sexiest job of the 21st century". This role brings together deep domain knowledge, a solid foundation in statistical and mathematical methods, advanced computation and visualization technology, and a desire to tackle "wicked problems".
CDSI by the Numbers:
- NMEDW has supported 858 research projects and more than 150,000 report executions -- a 258% increase since 2011
- 70 packages, with 164 forks, on GitHub
- Over 60,000 Bioconductor downloads in 2014 alone