Whole genome sequencing has brought about a whole new set of research challenges in the biomedical field. The vast amounts of data produced from sequencing the human genome necessitates a new computational strategy as well. No longer is it feasible for researchers to build out datacenters equipped with adequate computational power and storage capacity for whole genome sequencing. Cloud computing lowers the barrier to entry for many institutions and allows for access to required resources at a fraction of the cost that it would take to build out a datacenter with the required capacity.
Currently, the leading player in the cloud computing arena is Amazon Web Services. They offer a multitude of resources that help to store and analyze the data produced by whole genome sequencing. Amazon S3 is used to store the data in a secure, encrypted, redundant environment. EC2 provides a computational environment that is flexible, scalable and stable. Users are able to create virtual machines of various sizes, with up to 60 GB of RAM and 88 cores, and are also able to spin up multiple instances so that workflows can be parallelized. Elastic Map Reduce provides a framework for parallelizing jobs, so that tasks that may have taken days before can now be performed in a matter of hours. All of these services combine to provide research institutions with the necessary capacity to store and analyze the onslaught of next generation sequencing data.
ICBI utilizes Amazon Web Services to store and analyze hundreds of whole genome sequences in a secure and scalable environment. Data analysis pipelines leverage the elastic nature of the cloud and allow the center to scale to thousands of whole genome sequences.