Version 2.1.1.0 CRISP Logo CRISP Homepage Help for CRISP Email Us

Abstract

Grant Number: 1R43GM067276-01A1
PI Name: DARASELIA, NIKOLAI D.
PI Email: nikolai@ariadnegenomics.com
PI Title:
Project Title: Flexible NLP system for MEDLINE information extraction

Abstract: DESCRIPTION (provided by applicant): This Small Business Innovation and Research Phase I project focuses on the development of the fully automatic system for extraction of the protein function information from MEDLINE abstracts and conversion it into a form of a conceptual graph. All existent protein function databases depend on human experts who cannot keep up with the exponential growth of protein function information freely available in MEDLINE. There is an urgent need for an automatic system capable of extracting protein function information from literature. The system we proposed will be based on advanced natural language processing (NLP) technologies, and uses it as a fast and reliable way to extract information about protein function from human readable sources. To this end, we have developed and tested MedScan - a prototype of such system that parses scientific abstracts and converts protein function information into a form of a conceptual graph. It consists of a preprocessor module selecting candidate sentences from MEDLINE, an NLP module utilizing proprietary linguistic model to parse the selected sentences, and an information extraction module utilizing developed ontology to extract and validate protein function information. The results of MedScan evaluation indicate that it is a feasible candidate for a proposed task. In Phase II, the software system will be developed to assist the researchers to quickly access, search and navigate through the MEDLINE content, and to visualize and analyze the large volumes of protein function data. We will also extend our approach to other areas including pharmacogenomics and extraction of clinically relevant information.

Thesaurus Terms:
computer program /software, computer system design /evaluation, information retrieval, information system, molecular biology information system
information dissemination, protein structure function

Institution: ARIADNE GENOMICS, INC.
58 CALABASH CT
ROCKVILLE, MD 20850
Fiscal Year: 2003
Department:
Project Start: 01-AUG-2003
Project End: 31-JAN-2004
ICD: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
IRG: ZRG1


CRISP Homepage Help for CRISP Email Us