Introduction
MSAProbs is a well-established state-of-the-art multiple sequence alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. In addition, MSAProbs is optimized for shared-memory CPUs by employing a multi-threaded design, and further parallelized for distributed-memory systems using MPI to overcome high memory overhead barrier and achieve good parallel and data-size scalability.
Downloads
- MSAProbs v0.9.7
multithreaded and parallelized for shared-memroy sytems. Changes are availabe at changelog.
- MSAProbs-MPI v1.0.5NEW
MSAProbs-MPI is a parallelization of MSAProbs (v0.9.7) using MPI for distributed-memory systems. By using distributed-memory systems, we manage to overcome high memory overhead barriers for multiple alignment of thousands of protein sequences. By scaling with hundreds of cores, we can reach faster speed for large-scale protein sequence datasets. The manual for MSAProbs-MPI is available here
Citation
- Yongchao Liu, Bertil Schmidt, Douglas L. Maskell: "MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities". Bioinformatics, 2010, 26(16): 1958-1964
- Yongchao Liu and Bertil Schmidt: Multiple protein sequence alignment with MSAProbs". Methods in Molecular Biology, 2014, 1079: 211-218
- Jorge Gonzalez-Dominguez, Yongchao Liu, Juan Tourino and Bertil Schmidt: "MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems". Bioinformatics, 2016, 32(24): 3826-3828
Other related papers
- Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell: "MSA-CUDA: multiple sequence alignment on graphics processing units with CUDA". 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2009), 2009, 121-128
- Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell: "Parallel reconstruction of neighbor-Joining trees for large multiple sequence alignments using CUDA". 23th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2009), 2009, 1-8.
Parameters
- -o, --outfile <string> specify the output file name (STDOUT by default)
- -num_threads <integer> specify the number of threads used, and otherwise detect automatically
- -clustalw use CLUSTALW output format instead of FASTA format
- -c, --consistency REPS use 0 <= REPS <= 5 (default: 2) passes of consistency transformation
- -ir, --iterative-refinement REPS use 0 <= REPS <= 1000 (default: 10) passes of iterative-refinement
- -v, --verbose report progress while aligning (default: off)
- -annot FILENAME write annotation for multiple alignment to FILENAME
- -a, --alignment-order print sequences in alignment order rather than input order (default: off)
- -version print out version of MSAPROBS
Installation and Usage
Compile on Windows and Linux
- Linux and Windows are supported, with a Makefile and a Visual Studio 2005 project co-existing in the source code tarball.
- On Linux, change to sub-directory MSAProbs and then type command "make" to compile the program.
- The default compiling options enable OpenMP support to fully utlized the compute capability of multi-core CPUs, as multi-core CPUs have been commonplace.
Compile on Mac OS X
The solution to compile MSAProbs on Mac OS X Mavericks is described step-by-step in the following.
- Install gcc49 using MacPorts (require the installataion of MacPorts), by executing the command sudo port install gcc49. This will install the latest GCC 4.9 compiler into the directory /opt/local/bin
- Modify the Makefile provided for Linux (as mentioned above) to point to GCC 4.9, which has support for OpenMP. Two macros should be modified as follows.
- Change the macro CXX to CXX = /opt/local/bin/g++-mp-4.9.
- Add -I /opt/local/include to the macro COMMON_FLAGS.
Users can download an example Makefile for Mac OS X from here. We thank Andrei LIHU for contributing such a solution.
Typical Usage
- "msaprobs -help" or "msaprobs -?"
Get the command line options
- "msaprobs infile >outfile" or "msaprobs infile -o outfile"
Output the multiple alignments in FASTA format to file "outfile"
- msaprobs -o outfile -num_threads 4 infile1.fa infile2.fa
Use 4 threads to accelerate the multiple alignment execution
BioPerl usage
MSAProbs has been supported by BioPerl. How to use this program in BioPerl? Click here for details.
Change Log
- May 02, 2016 (in release 1.0.5)
- The first version of MSAProbs-MPI gets releases. This is a parallelization of MSAProbs v0.9.7 using MPI parallel programming model for distributed-memory systems.
- July 3, 2012 (in release 0.9.7)
- Add a new option "-o" to allow users to specify the output file of multiple alignments, instead of the default STDOUT
Contact
If any questions or improvements, please feel free to contact Liu, Yongchao.