Claude Shannon

Wednesday, November 15th, 2017

Room 202 in Packard Bldg., Stanford University
Parking Generally Free In Nearby Lots After 4:00 pm

Refreshments and Conversation at 6:00 P.M.
Presentation at 6:30 P.M.

DNA Sequencing: From Information Limits to Genome Assembly Software

Ilan Shomorony, PhD.
Member Data Science team, Human Longevity, Inc.

Please register here.


Emerging long-read sequencing technologies promise to enable near-perfect reconstruction of whole genomes. Assembly of long reads is usually accomplished using a read-overlap graph, in which the true sequence corresponds to a Hamiltonian path. As such, the assembly problem becomes NP-hard under most formulations, and most of the known algorithmic approaches are heuristic in nature.

In this talk, we show that by focusing on the informational limits of this problem, rather than the computational ones, one can design assembly algorithms with correctness guarantees. We begin with a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the genome? We show that in most instances of the problem where the reads contain enough information for assembly, the read-overlap graph can be sparsified, allowing the problem to be solved in linear time. To study the remaining information-infeasible instances, we formulate the partial assembly problem from a rate-distortion perspective. We introduce a notion of assembly graph distortion, and propose an algorithm that seeks to minimize this quantity. Finally, we describe how these ideas formed the theoretical foundation of our long-read assembly software HINGE, which outperforms existing tools and is currently being employed by genomics research groups and companies


Photo of AUTHOR Ilan Shomorony is part of the Data Science team at Human Longevity, Inc., where he conducts research on computational methods for analyzing genomic data. Previously he was a postdoctoral fellow at the NSF Center for Science of Information (CSoI), working with Prof. Tom Courtade (UC Berkeley) and Prof. David Tse (Stanford University). He obtained his PhD in Electrical and Computer Engineering at Cornell University in August 2014, under the supervision of Prof. Salman Avestimehr. He received a Simons research fellowship for Spring 2014, and the Qualcomm Innovation Fellowship in 2013.



SCV IT Society Webmaster (
Last updated on

Return to SCV IT Society Homepage