| June 26, 2011 12:00 AM PDT | |
Prerequisites
You need to have deployed the latest Intel(R) Cluster Ready Reference Design S5520UR-ICR1.1-ROCKS5.3-CENTOS5.4-C2 v1.0.
This reference design targets the next components:
- Intel® Xeon® Processors X5660
- Intel® Server Board S5520UR
- Rocks* 5.3
- CentOS* 5 Update 4
Set up the environment
• MVAPICH2 will be used to build MPIBLAST. To configure the environment for use MVAPICH2 run the next command:
mpi-selector-menu
• A selection menu will be displayed. Select "mvapich2_gcc-1.4.1" for the current user with the option (1u) and exit.
Current system default: "<"none">"
Current user default: mvapich2_gcc-1.4.1
"u" and "s" modifiers can be added to numeric and "U"
commands to specify "user" or "system-wide".
1. mvapich2_gcc-1.4.1
2. mvapich_gcc-1.2.0
3. openmpi_gcc-1.4.1
U. Unset default
Q. Quit
Selection (1-3[us], U[us], Q):1u
…
WARNING: Changes made to mpi-selector defaults will not be visible until
you start a new shell!
• Log out from the terminal and then log back in for the changes to take place.
MPIBLAST Compilation
- Log in to the cluster with your normal user.
- Create a staging directory to build the tools named "src" and an "opt" directory to install them.
mkdir /home/"<user>"/opt mkdir /home/"<user>"/src
NOTE: the "<user>" variable must be replaced with your username.
• Before start the process make sure your proxy settings are correct.
export http_proxy=proxy_url:port export ftp_proxy=proxy_url:port
• Change to the directory created for the source code of the application.
cd /home/"<user>"/src
• Download MPIBLAST source code and uncompress it.
wget http://www.mpiblast.org/downloads/files/mpiBLAST-1.6.0.tgz tar xf mpiBLAST-1.6.0.tgz cd mpiblast-1.6.0
• Configure the building environment to create the binaries in the /home/<user>/opt/mpiblast folder.
./configure --prefix=/home/"<user>"/opt/mpiblast
• Turn off MOTIF support by adding "set HAVE_MOTIF=0" in line 388 of /home/<user>/src/ncbi/makedis.csh
vim ncbi/make/makedis.csh +388
NOTE: for further information about this issue please refer to this forum.
• First, it is required to build NCBI.
make ncbi
NOTE: the output of this command if is executed successfully should be like:
make[1]: Leaving directory `/home/"<user>"/src/mpiblast-1.6.0/ncbi/build' Put the date stamp to the file ../VERSION ********************************************************* *The new binaries are located in ./ncbi/build/ directory*
• Run the next commands for build and install MPIBLAST:
make make install
NOTE: For further information on how to configure and build MPIBLAST, please refer to http://www.mpiblast.org/.
Test
This section shows how to test the new installation of MPIBlast 1.6.0. To do so, the input file extracted from the white paper NCBI Blast on AMD Magny-Cours, Istanbul, and Intel Nehalem will be used against the standard FASTA "month" database.
• Create a test directory in the user's home folder.
mkdir tests
• Download the workload and the database to be used in the test.
cd tests wget http://files.scalableinformatics.com/whitepapers/ncbi_mc/a_thal_1168.fsa wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/month.nt.gz
• Extract the downloaded database.
gunzip month.nt.gz
• Generate the /home/"<user>"/.nbir file.
HOME_FILE=/home/"<user>"/.ncbirc echo "[mpiBLAST]" > $HOME_FILE echo "Shared=/home/<user>/tests" >> $HOME_FILE echo "Local=/home/<user>/tests" >> $HOME_FILE
• Add MPIBLAST binaries to user's path.
export PATH=$PATH:/home/"<user>"/opt/mpiblast/bin
• Format the database to divide it into fragments. The parameter passed to --nfrags ("<N>") defines the number of fragments in the database. Execute this command:
mpiformatdb -i month.nt --nfrags="<N>" -pF –quiet
NOTE: a valid value for the "<N>" parameter could be the number of nodes in the cluster.
The output of this command should be like:
Created "<N>" fragments. "<<<" Please make sure the formatted database fragments are placed in /home/"<user>"/tests/ before executing mpiblast. ">>>"
• Copy the system's ICR nodelist and modify it by removing any comment such as "#type: head". Use the fully qualified domain name (FQND) for the head node.
cp /etc/intel/clck/nodelist . vi nodelist # Modify to obtain similar results cat nodelist head-node.your-domain.com compute-1 compute-2…
• Launch the MPD daemons in the entire cluster. The parameter <N> should be replaced with the number of nodes of the cluster.
mpdboot -n <N> -f nodelist -r ssh
• Run mpiblast using the "-np" parameter to define the number of processes to start. A simple formula to set this numbers is number_nodes x number_physical_cores. In the example the output of the command is directed to results.txt
mpiexec -machinefile nodelist -np "<N>" mpiblast -d month.nt -i a_thal_1168.fsa -p blastn -o results.txt
• An example of the output that you should obtain is referenced below:
BLASTN 2.2.20 [Feb-08-2009]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= 25094 Lambda-PRL2 Arabidopsis thaliana cDNA clone 251F2T7, mRNA
sequence.
(604 letters)
Database: month.nt
127,522 sequences; 512,131,274 total letters
Sequences producing significant alignments: (bits) Value
gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence 367 e-100
ref|XM_001160875.2| PREDICTED: Pan troglodytes eukaryotic initia... 68 1e-09
ref|XM_511724.3| PREDICTED: Pan troglodytes eukaryotic translati... 68 1e-09
ref|XM_001151057.2| PREDICTED: Pan troglodytes eukaryotic initia... 60 3e-07
ref|XM_516936.3| PREDICTED: Pan troglodytes eukaryotic translati... 56 4e-06
ref|XM_001370153.2| PREDICTED: Monodelphis domestica eukaryotic ... 54 2e-05
ref|XM_003358747.1| PREDICTED: Sus scrofa eukaryotic initiation ... 46 0.004
ref|XM_001370908.2| PREDICTED: Monodelphis domestica eukaryotic ... 44 0.016
ref|XM_508726.3| PREDICTED: Pan troglodytes DCN1, defective in c... 40 0.25
ref|XM_001146090.2| PREDICTED: Pan troglodytes Nipped-B homolog ... 36 3.8
tpg|BK008000.1| TPA_inf: Anolis carolinensis von Willebrand fact... 36 3.8
gb|CP002743.1| Bifidobacterium breve ACS-071-V-Sch8b, complete g... 36 3.8
gb|CP002687.1| Arabidopsis thaliana chromosome 4, complete sequence 36 3.8
>gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence
Length = 23459830
Score = 367 bits (185), Expect = e-100
Identities = 202/205 (98%), Gaps = 2/205 (0%)
Strand = Plus / Plus
Query: 187 agggtgttgcaatcaactttgttaaaagcgacgacatcaagattctcagagacattgagc 246
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 6866127 agggtgttgcaatcaactttgttaaaagcgacgacatcaagattctcagagacattgagc 6866186
…………………
…………….
………….
Database: /home/"<user>"/tests/month.nt.005
Posted date: Jun 21, 2011 11:43 AM
Number of letters in database: 64,010,690
Number of sequences in database: 18,924
Database: /home/"<user>"/tests/month.nt.006
Posted date: Jun 21, 2011 11:43 AM
Number of letters in database: 64,010,686
Number of sequences in database: 17,765
Database: /home/"<user>"/tests/month.nt.007
Posted date: Jun 21, 2011 11:43 AM
Number of letters in database: 64,044,706
Number of sequences in database: 12,195
NOTE: For further information about MPIBLAST, please refer to http://www.mpiblast.org/.
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (0) 
Trackbacks (0)
Leave a comment 
Victor Rosales (Intel)
| ||
Emanuel Ravera (Intel)
|

