IBM POWER8 HPC System Accelerates Genomics Analysis with SMT8 Multithreading

Advances in the data and compute intensive scientific use case of Hadoop

Published August 2016

Current compute technologies and scientific methods for Big Data analysis today are demanding more compute cycles per processor than ever before, with extreme I/O performance to match. An LSU and IBM collaboration has demonstrated dramatic speedups on genomic analysis.

Despite multiple prior Hadoop Genome analysis attempts on an existing 120-node Intel Xeon-based LSU HPC cluster, a large metagenome dataset could not be analyzed in a reasonable period of time on existing LSU resources. Knowing the extraordinary capabilities for big data analysis offered by IBM Power Systems, the LSU Center for Computational Technologies staff and researchers turned to IBM for help.

The existing Hadoop-based method was ported to an IBM Customer Center 40 Node POWER8 cluster running Ubuntu 14.10 and IBM Spectrum Scale (formerly GPFS). The result was astonishing: The 1st phase of the Hadoop analysis on the huge 3.2TB metagenome dataset was rendered in 6.25 hours, using only 40 nodes. LSU and IBM are excited to report on these promising advances in this data and compute intensive scientific use case of Hadoop.