Compile and run MPIBLAST 1.6.0 in the Intel® Cluster Ready Reference Design S5520UR-ICR1.1-ROCKS5.3-CENTOS5.4-C2 v1.0

Prerequisites

You need to have deployed the latest Intel® Cluster Ready Reference Design S5520UR-ICR1.1-ROCKS5.3-CENTOS5.4-C2 v1.0.
This reference design targets the next components:

Set up the environment

•  MVAPICH2 will be used to build MPIBLAST. To configure the environment for use MVAPICH2 run the next command:

[shell]mpi-selector-menu[/shell]

•  A selection menu will be displayed. Select "mvapich2_gcc-1.4.1" for the current user with the option (1u) and exit.

[shell]Current system default: "<"none">" Current user default: mvapich2_gcc-1.4.1 "u" and "s" modifiers can be added to numeric and "U" commands to specify "user" or "system-wide". 1. mvapich2_gcc-1.4.1 2. mvapich_gcc-1.2.0 3. openmpi_gcc-1.4.1 U. Unset default Q. Quit Selection (1-3[us], U[us], Q):1u … WARNING: Changes made to mpi-selector defaults will not be visible until you start a new shell! [/shell]

 

•  Log out from the terminal and then log back in for the changes to take place.

MPIBLAST Compilation

  • Log in to the cluster with your normal user.
  • Create a staging directory to build the tools named "src" and an "opt" directory to install them.

[shell]mkdir /home/"<user>"/opt mkdir /home/"<user>"/src [/shell]


NOTE: the "<user>" variable must be replaced with your username.

  Before start the process make sure your proxy settings are correct.

[shell]export http_proxy=proxy_url:port export ftp_proxy=proxy_url:port [/shell]


  Change to the directory created for the source code of the application.

[shell]cd /home/"<user>"/src[/shell]

  Download MPIBLAST source code and uncompress it.

wget http://www.mpiblast.org/downloads/files/mpiBLAST-1.6.0.tgz
tar xf mpiBLAST-1.6.0.tgz
cd mpiblast-1.6.0


  Configure the building environment to create the binaries in the /home/<user>/opt/mpiblast folder.

[shell]./configure --prefix=/home/"<user>"/opt/mpiblast[/shell]

  Turn off MOTIF support by adding "set HAVE_MOTIF=0" in line 388 of /home/<user>/src/ncbi/makedis.csh

[shell]vim ncbi/make/makedis.csh +388[/shell]

NOTE: for further information about this issue please refer to this forum
 
•  First, it is required to build NCBI.

[shell]make ncbi[/shell]

NOTE: the output of this command if is executed successfully should be like:

[shell]make[1]: Leaving directory `/home/"<user>"/src/mpiblast-1.6.0/ncbi/build' Put the date stamp to the file ../VERSION ********************************************************* *The new binaries are located in ./ncbi/build/ directory* [/shell]

 
•  Run the next commands for build and install MPIBLAST:

[shell]make make install [/shell]


NOTE: For further information on how to configure and build MPIBLAST, please refer to http://www.mpiblast.org/.

 Test

This section shows how to test the new installation of MPIBlast 1.6.0. To do so, the input file extracted from the white paper NCBI Blast on AMD Magny-Cours, Istanbul, and Intel Nehalem will be used against the standard FASTA "month" database.

•  Create a test directory in the user's home folder.

mkdir tests

•  Download the workload and the database to be used in the test.

[shell]cd tests wget http://files.scalableinformatics.com/whitepapers/ncbi_mc/a_thal_1168.fsa wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/month.nt.gz [/shell]


•  Extract the downloaded database.

[shell]gunzip month.nt.gz[/shell]

•  Generate the /home/"<user>"/.nbir file. 

[shell]HOME_FILE=/home/"<user>"/.ncbirc echo "[mpiBLAST]" > $HOME_FILE echo "Shared=/home/<user>/tests" >> $HOME_FILE echo "Local=/home/<user>/tests" >> $HOME_FILE [/shell]


•  Add MPIBLAST binaries to user's path.

[shell]export PATH=$PATH:/home/"<user>"/opt/mpiblast/bin[/shell]

•  Format the database to divide it into fragments. The parameter passed to --nfrags ("<N>") defines the number of fragments in the database. Execute this command:

[shell]mpiformatdb -i month.nt --nfrags="<N>" -pF –quiet
[/shell]


NOTE: a valid value for the "<N>" parameter could be the number of nodes in the cluster.

   The output of this command should be like:

[shell]Created "<N>" fragments. "<<<" Please make sure the formatted database fragments are placed in /home/"<user>"/tests/ before executing mpiblast. ">>>" [/shell]


•  Copy the system's ICR nodelist and modify it by removing any comment such as "#type: head". Use the fully qualified domain name (FQND) for the head node.

[shell]cp /etc/intel/clck/nodelist . vi nodelist # Modify to obtain similar results cat nodelist head-node.your-domain.com compute-1 compute-2… [/shell]


•  Launch the MPD daemons in the entire cluster. The parameter <N> should be replaced with the number of nodes of the cluster.

mpdboot -n <N> -f nodelist -r ssh

•  Run mpiblast using the "-np" parameter to define the number of processes to start. A simple formula to set this numbers is number_nodes x number_physical_cores. In the example the output of the command is directed to results.txt

[shell]mpiexec -machinefile nodelist -np "<N>" mpiblast -d month.nt -i a_thal_1168.fsa -p blastn -o results.txt[/shell]

•  An example of the output that you should obtain is referenced below:

[shell]BLASTN 2.2.20 [Feb-08-2009] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= 25094 Lambda-PRL2 Arabidopsis thaliana cDNA clone 251F2T7, mRNA sequence. (604 letters) Database: month.nt 127,522 sequences; 512,131,274 total letters Sequences producing significant alignments: (bits) Value gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence 367 e-100 ref|XM_001160875.2| PREDICTED: Pan troglodytes eukaryotic initia... 68 1e-09 ref|XM_511724.3| PREDICTED: Pan troglodytes eukaryotic translati... 68 1e-09 ref|XM_001151057.2| PREDICTED: Pan troglodytes eukaryotic initia... 60 3e-07 ref|XM_516936.3| PREDICTED: Pan troglodytes eukaryotic translati... 56 4e-06 ref|XM_001370153.2| PREDICTED: Monodelphis domestica eukaryotic ... 54 2e-05 ref|XM_003358747.1| PREDICTED: Sus scrofa eukaryotic initiation ... 46 0.004 ref|XM_001370908.2| PREDICTED: Monodelphis domestica eukaryotic ... 44 0.016 ref|XM_508726.3| PREDICTED: Pan troglodytes DCN1, defective in c... 40 0.25 ref|XM_001146090.2| PREDICTED: Pan troglodytes Nipped-B homolog ... 36 3.8 tpg|BK008000.1| TPA_inf: Anolis carolinensis von Willebrand fact... 36 3.8 gb|CP002743.1| Bifidobacterium breve ACS-071-V-Sch8b, complete g... 36 3.8 gb|CP002687.1| Arabidopsis thaliana chromosome 4, complete sequence 36 3.8 >gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence Length = 23459830 Score = 367 bits (185), Expect = e-100 Identities = 202/205 (98%), Gaps = 2/205 (0%) Strand = Plus / Plus Query: 187 agggtgttgcaatcaactttgttaaaagcgacgacatcaagattctcagagacattgagc 246 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 6866127 agggtgttgcaatcaactttgttaaaagcgacgacatcaagattctcagagacattgagc 6866186 ………………… ……………. …………. Database: /home/"<user>"/tests/month.nt.005 Posted date: Jun 21, 2011 11:43 AM Number of letters in database: 64,010,690 Number of sequences in database: 18,924 Database: /home/"<user>"/tests/month.nt.006 Posted date: Jun 21, 2011 11:43 AM Number of letters in database: 64,010,686 Number of sequences in database: 17,765 Database: /home/"<user>"/tests/month.nt.007 Posted date: Jun 21, 2011 11:43 AM Number of letters in database: 64,044,706 Number of sequences in database: 12,195 [/shell]

 
NOTE: For further information about MPIBLAST, please refer to http://www.mpiblast.org/.

For more complete information about compiler optimizations, see our Optimization Notice.