- Dataset -

 

The datasets used for training and testing are freely available.

186 protein sequences used for training: (Dset186)

72 protein sequences used for testing: (Dtestset72)

 

Example of the data format is:

# PDB file  example.pdb
# Target chain : I
# Interacting chains : E
# Cut-off for surface: 5.0 Percentage
# Cut-off for interface 1.0 Angstrom^2
#
ITF 1 - 1 I LYS 8 175.80 87.50
ITF 2 - 1 I SER 9 57.52 49.40
ITF 3 - 0 I PHE 10 5.63 2.80
ITF 4 - 1 I PRO 11 78.15 57.40 : : ITF 60 - 0 I PRO 67 4.48 3.30
ITF 61 + 1 I HIS 68 113.49 62.10
ITF 62 + 1 I VAL 69 21.95 14.50
ITF 63 + 1 I GLY 70 15.49 19.30 #
# Interface resdieue = 17
# Interface ALL = 4227.36
# Interface BSA = 814.13
# Interface Polar BSA = 252.37
# Interface Non-Polar BSA = 561.76
# Interface Polarity = 31.00
#
# chain I and resid 40 41 42 43 44 45 46 47 48 49 53 55 65 66 68 69 70

 

The text file of the data contains nine columns:

  1. record name "ITF"
  2. record number
  3. interface residue;  + ... interface, − ... non interface
  4. surface residue; 1 ... surface, 0 ... non surface
  5. residue name
  6. residue number
  7. absolute solvent accessibility (SA; calculated by NACCESS)
  8. relative solvent accessibility (rSA; calculated by NACCESS)

 

© PSIVER Copyright
PSIVER is maintained by Yoichi MURAKAM @ Bioinformatics Project, NIBIO