Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps

Elena Nabieva, Kam Jim, Amit Agarwal, Bernard Chazelle, Mona Singh

Abstract

Motivation: Determining protein function is one of the most important problems in the post-genomic era. For the typical proteome, there are no functional annotations for one-third or more of its proteins. Recent high-throughput experiments have determined proteome-scale protein physical interaction maps for several organisms. These physical interactions are complemented by an abundance of data about other types of functional relationships between proteins, including genetic interactions, knowledge about co-expression and shared evolutionary history. Taken together, these pairwise linkages can be used to build whole-proteome protein interaction maps.

Results: We develop a network-flow based algorithm, FunctionalFlow, that exploits the underlying structure of protein interaction maps in order to predict protein function. In cross-validation testing on the yeast proteome, we show that FunctionalFlow has improved performance over previous methods in predicting the function of proteins with few (or no) annotated protein neighbors. By comparing several methods that use protein interaction maps to predict protein function, we demonstrate that FunctionalFlow performs well because it takes advantage of both network topology and some measure of locality. Finally, we show that performance can be improved substantially as we consider multiple data sources and use them to create weighted interaction networks.

[Link to Bioinformatics]

Attention: Correct version of the flow update rule (typo in the paper)

Biological Process Predictions for Uncharacterized ORFs

We provide confidence estimates for the best FunctionalFlow prediction for each ORF.  The confidence values are obtained by leave-one-out cross validation.  The scores were divided into 16 bins that contain scores for roughly similar numbers of proteins.  The fraction of  predictions in each bin that were made correctly is the confidence value for all scores in the bin. A curve was then fitted to the bin scores, and it forms the basis of the confidence estimates for novel predictions.

Top-scoring prediction(s) for Functional Flow, Majority rule, Generalized Multicut by ORF, sorrted by the confidence estimate for the FunctionalFlow prediction.

Tab-delimited text file. Multiple predictions for a method are separated by semicolons.


This page is under construction.