The proposed method is capable of predicting interactions between two proteins (of unknown structure) using Averaged One-Dependence Estimators (AODE) and three features calculated for the protein pair: (a) sequence similarities to a known interacting protein pair (FSeq), (b) statistical propensities of domain pairs observed in interacting proteins (FDom) and (c) a sum of edge weights along the shortest path between homologous proteins in a PPI network (FNet). Feature vectors were defined to lie in a half-space of the symmetrical high-dimensional feature space to make them independent of the protein order.

The predictability of the method was assessed by a 10-fold cross validation on a recently created human PPI dataset with randomly sampled negative data, and the best model achieved an Area Under the Curve of 0.79 (pAUC0.5% = 0.16). In addition, the AODE trained on all three features (named PSOPIA) showed better prediction performance on a separate independent data set than a recently reported homology-based method.

The datasets used for the evaluation of the PSOPIA are available from the following URLs:
(1) Dset2_pos_4430 (A set of 4,430 interacting sequence pairs)
(2) Dset2_neg_1772000 (A set of 1,772,000 non-interacting seqeunce pairs)

Yoichi Murakmai and Kenji Mizuguchi

