Free Newsletter
Register for our Free Newsletters
Advanced Composites
Amorphous Metal Structures
Analysis and Simulation
Asbestos and Substitutes
Associations, Research Organisations and Universities
Automation Equipment
Building Materials
Bulk Handling and Storage
CFCs and Substitutes
View All
Other Carouselweb publications
Carousel Web
Defense File
New Materials
Pro Health Zone
Pro Manufacturing Zone
Pro Security Zone
Web Lec
Pro Engineering Zone

Los Alamos-led team to sequence entire NT biological database on greengene distributed supercomputer

DOE/Los Alamos National Laboratory : 18 November, 2005  (Technical Article)
Award-winning Los Alamos National Laboratory-developed software is helping researchers here and elsewhere better understand a database of biological information and enable a plethora of biological studies from organism 'barcoding' to gene function and evolution.
The software, mpiBLAST, coupled with a supercomputer assembled over a high-speed network and distributed across the country just for this purpose, will make the biological information stored in large databases more useful for researchers by enabling a Google-like indexing structure that tracks relationships among the sequences in these large databases. Such an indexing structure can increase search speed times by a factor of 100 while at the same time providing an up to 20-fold compression in the size of the database.

mpiBLAST, an open-source project led by Los Alamos researcher Wu Feng, is being tapped to harvest the NT biological database in order to create the Google-like indexing structure. Los Alamos researchers announced at this week's Supecomputing 2005 Conference that they will lead a large-scale nationwide effort to sequence-search the entire NT database.

The NT biological database is akin to a 'biological dictionary organized as a flat file.' When biologists need to know if a particular genomic sequence has already been catalogued, they look through this dictionary for that genomic sequence. If they can't find the desired sequence, they add the new information to the end of the file thus making the unordered file larger and larger.

With the idea that it would be much better to organize the database and build it with some structure that is searchable in a non-linear manner, Feng at Los Alamos, and other scientists, using the 'GreenGene' supercomputer, intend to give this huge database that structure by sequencing the entire database against itself.

'If this endeavor to sequence-search the entire NT database succeeds, the result of this experiment will provide critical information to the biology community, including insightful evolutionary, structural, and functional relationships between every sequence and family in the NT database,' notes Feng of Los Alamos' Advanced Computing Laboratory, principal investigator for the project. In all, the large-scale experiment is expected to generate 100 terabytes of output - enough to fill-up roughly 2,000 iPods, he added.

mpiBLAST, as distributed by Los Alamos National Laboratory ( or, won an R&D 100 Award in 2004. It is a search tool that enables biologists to characterize an unknown sequence by comparing it against a database of known sequences. The similarity between sequences then enables biologists to detect evolutionary relationships and infer biological properties of the unknown sequence. On a 128-processor supercomputing cluster, mpiBLAST can deliver a speed-up of 305-fold, thus decreasing the search time of a representative 300-kilobyte query file from nearly 24 hours down to only 5 minutes. Additional speed-up, as provided by a parallelized I/O version of mpiBLAST called mpiBLAST-pio, reduces the search time further and allows the code to scale to larger system configurations.

Led by Los Alamos National Laboratory, the nationwide team working together on this collaborative endeavor includes industrial participants from Intel, Panta Systems and Foundry Networks. Academic and government participants are from North Carolina State University, Oak Ridge National Laboratory, Utah and Virginia Tech universities. In addition, the team will use the high-speed experimental facilities of the National LambdaRail(tm) network to connect the 2,200-processor System X supercomputer at Virginia Tech to the SC|05 Supercomputing showroom floor to create the distributed, heterogeneous supercomputer dubbed 'GreenGene.'
Bookmark and Share
Home I Editor's Blog I News by Zone I News by Date I News by Category I Special Reports I Directory I Events I Advertise I Submit Your News I About Us I Guides
   © 2012
Netgains Logo