Interdisciplinary Initiatives Program Round 1 - 2000

Robert Shafer, Medicine
Daphne Koller, Computer Science

We have created the HIV RT and Protease Sequence Database (HIVRT&PrDB) to represent, store, and analyze the diverse forms of data underlying drug resistance knowledge and have made these data available online to researchers studying HIV drug resistance and clinicians using HIV drug resistance tests (

The database correlates sequence changes in RT and protease to other forms of data including antiviral treatment history, phenotypic (drug susceptibility) data, and clinical outcome (the virologic and immunologic response to a new treatment regimen). The correlations in HIVRT&PrDB have been used to develop a rules-based online expert system (HIVdb) for helping physicians choose antiviral drugs based on the RT and protease mutations in a clinical virus sample.

To encode HIVdb, we have developed a programming platform called an Algorithm Specification Interface (ASI) that creates a uniform approach to encoding, implementing, and comparing algorithms for HIV genotypic interpretation. ASI consists of an XML format for specifying an algorithm and a compiler that transforms the XML into executable code. Novel approaches have been developed to reduce the high dimensionality of HIV sequence data. Several machine learning projects have begun to train HIVdb on clinical data sets as they are collected and submitted to HIVRT&PrDB.