Seminar: Scalable Algorithms for Multiple Sequence Alignment in
Computational Biology
Abstract:
Multiple Sequences Alignment (MSA) of biological sequences is a
fundamental problem in computational biology due to its critical
significance in wide ranging applications including haplotype
reconstruction, sequence homology, phylogenetic analysis, and
prediction of evolutionary origins. The MSA problem is considered
NP-hard and known heuristics for the problem do not scale well with
increasing number of sequences. On the other hand, with the advent
of new breed of fast sequencing techniques such as pyrosequencing/solexa
it is now possible to generate large number of sequences very
quickly and can reach upto a million sequences. For rapid sequence
analysis, it is therefore desirable to develop fast MSA algorithms
that scale well with the increase in the dataset size. In this talk,
we discuss a novel domain decomposition based technique to solve the
MSA problem on multiprocessing platforms. The domain decomposition
based technique, in addition to yielding better quality, gives
enormous advantage in terms of execution time and memory
requirements. The proposed strategy allows to decrease the time
complexity of any known heuristic of O(N)^x complexity by a factor
of O(1/p)^x, where N is the number of sequences, x depends on the
underlying heuristic approach, and p is the number of processing
nodes. In particular, we discuss a highly scalable algorithm,
Sample-Align-D, for aligning biological sequences using Muscle
system as the underlying heuristic. The algorithm is proposed to be
implemented on a cluster of workstations using MPI library.
Experimental results for different problem sizes are analyzed in
terms of quality of
alignment, execution time and speed-up.
Bio:
Ashfaq A. Khokhar received his M.S. in computer engineering from
Syracuse University, in 1989 and Ph.D. in computer engineering from
University of Southern California, in 1993. After his Ph.D., he
spent two years as a Visiting Assistant Professor in the Department
of Computer Sciences and School of Electrical and Computer
Engineering at Purdue University. In 1995, he joined the Department
of Electrical and Computer Engineering at the University of
Delaware, where he first served as Assistant Professor and then as
Associate Professor. In Fall 2000, Dr. Khokhar joined UIC in the
Department of Computer Science and Department of Electrical and
Computer Engineering, where he currently serves on the rank of
Professor.
Dr. Khokhar has published over 180 technical papers and book
chapters in refereed conferences and journals in the areas of
wireless networks, multimedia systems, data mining, and high
performance computing. He is a recipient of the NSF CAREER award in
1998. His paper entitled "Scalable S-to-P Broadcasting in Message
Passing MPPs" has won the Outstanding Paper award in the
International Conference on Parallel Processing in 1996. He has
served as the Program Chair of the 17th Parallel and Distributed
Computing Conference (PDCS), 2004, Vice Program Chair for the 33rd
International Conference on Parallel Processing (ICPP), 2004, and
Program Chair of the
IEEE Golbecomm Symposium on Adhoc and Sensor Networks, 2009.
He is a Fellow of
IEEE for his contributions to multimedia computing and
databases.
His research interests include: wireless and sensor networks,
multimedia systems, data mining, and high performance computing.
Contact Information
Venue: A-12, LUMS, DHA, Lahore Cantt. Ishtiaq A Bhatti Office
Manager,
IEEE-Lahore Section C/o CS Dept. LUMS, DHA Lahore Cantt.
Ph: +92 42 3572 7616 Fax: +92 42 3589 8315