Information integration typically requires the construction of complex artifacts like federated databases, ETL scripts, data warehouses, applications for accessing multiple data sources, and applications that ingest or publish XML. For many companies, it is one of the most complicated IT tasks they face today. To reduce the overall cost, intelligent tools are needed to simplify this difficult task. Clio is a schema mapping system for information integration developed at IBM Almaden Research. It helps users define relationships (or formally mappings) between two relational or XML data sources. It then interprets the mapping semantics and generates transformation queries (SQL, SQL/XML or XQuery), scripts (XSLT) or Java code automatically. Some of the Clio functionalities are now available in IBM Rational Data Architect product. Furthermore, many advanced functionalities and components that better support information integration tasks have been built in Clio. In this talk, we will give an overview of the Clio system and describe a few related research problems that we have studied or are studying.
He joined IBM Almaden Research Center as a Research Staff Member in 1989. He was manager of the Foundations of Massively Parallel Computing group from 1994 to 1996, where he led the development of collective communication, as part of IBM MPL and MPI, for IBM SP-1 and SP-2 parallel systems. He is manager of the Intelligent Information Integration group and the Clio Project since March 2001. His current research interests include XML and database, especially on the schema mapping and information integration, and data mining. His past research interests include on-line analytical processing (OLAP), communication issues for interconnection networks, algorithms for collective communications, graph embeddings, fault tolerance, and parallel algorithms and architectures. He has published over 30 journal papers and over 50 conference papers in these areas.
Dr. Ho is a co-recipient of the 1986 ``Outstanding Paper Award'' of the International Conference on Parallel Processing. He has received an IBM Outstanding Innovation Award, two IBM Outstanding Technical Achievement Awards, and five IBM Plateau Invention Achievement Awards. He has filed 15 patents. He has served on the Editorial Boards of the IEEE Transactions on Parallel and Distributed Systems for four years. He was an Area Editor of the Journal of Interconnection Networks for three years and has been one of three Managing Editors for three years. He has served as program vice-chairs twice for parallel processing conferences. He has co-edited a book "Large-Scale Parallel Data Mining" by Springer-Verlag in 2000. He has served Program Co-Chair of the Workshop on the Large-Scale Parallel KDD Systems, 1999. He has served on over 30 program committees of conferences and workshops in parallel processing, data mining and database. He is a member of the ACM and the IEEE Computer Society.
Return to Santa Clara Valley Chapter IEEE Computer Society page.