[ Search | Browse Families | Software | Information | Home ]


Help on HOMSTRAD


What's new

22 Jun 2005
The -sup.pdb file for the Amidase family has been updated to make it consistent with the .atm files.

20 May 2005
The 'Align' icon is now linked to a new version of the FUGUE alignment server.

5 May 2004
The lipase family alignment has been updated.
(The sequence of the entry 1bu8 has been modified.)

15 Jan 2003
Forty eight new families have been added.

5 Nov 2002
Eight new families have been added.

8 Oct 2002
Twelve new families have been added.

26 Sep 2002
New download page has been created.

3 Jul 2002
Eight new families have been added.

20 Jun 2002
Twelve new families have been added.

30 May 2002
Two new families have been added.
CBD_4 has been expanded and renamed CBM_20.
ghf13 has been expanded and renamed alpha-amylase_NC plus families representing both constituent domains have been added (alpha-amylase, alpha-amylase_C). Three related multi domain amylase families have also been added (Cyclodex_gly_tran, isoamylase_NC, malt_amylase_NC).

21 May 2002
Seventeen new families have been added.

24 Apr 2002
Eight new families have been added.

9 Apr 2002
Five new families have been added.

26 Mar 2002
Ten new families have been added.

20 Mar 2002
The rubisco family was replace by four separate families: RuBisCO_large, RuBisCO_large_N, RuBisCO_large_NC, RuBisCO_small.

7 Mar 2002
Eighteen new families have been added.

18 Jan 2002
New alignments, which include homologous Pfam family sequences, can be viewed for 317 families.

21 Dec 2001
Fifteen new families have been added.

29 Nov 2001
Nineteen new families have been added.

1 Nov 2001
Information pages updated. 'Beginner's guide to Homstrad' added.

21 Sept 2001
Two hundred and thirty one new families have been added.

17 Jul 2001
Twenty-seven new families have been added.

11 Jul 2001
Forty-two new families have been added.

11 Apr 2001
Two new families have been added.

4 Apr 2001
Multi-member families in HOMSTRADPLUS have been integrated into HOMSTRAD. Click
"show WITH homologous sequences"
to display the sequence-structure alignment.

26 Jul 2000
Twenty-four new families have been added.

31 May 2000
The FUGUE search engine and alignment server is available
(the alignment server is accessible from the page for each HOMSTRAD family).

16 Mar 2000
The PDB code search engine is available.

Single member families (HOMS) have been added.
[index]

Introduction

HOMSTRAD (HOMologous STRucture Alignment Database) provides aligned three-dimensional structures of homologous proteins. The word homology is only used to mean having a common evolutionary origin, but we practically define homologous families as a group of proteins with sufficiently high sequence identities. We combine the classifications proposed by various databases including SCOP, Pfam, PROSITE and SMART and the results from sequence similarity searches by PSI-BLAST and FUGUE and make our own decisions to define the families. Our focus is to collect reasonable sets of protein sequences, where functionally and structurally important residues can be correctly aligned and highlighted. For example, even if a highly conserved local sequence motif is shared by a diverse group of proteins, it is sometimes difficult to align all the sequences on the basis of their structures. In such a case, we split the group into several smaller ones and call them families. On the other hand, some families defined in HOMSTRAD include protein pairs with fairly low overall sequence similarity but they still present convincing structure-based alignments.

Most of the families have, on the average, more than 30 percent sequence identity, and even when the lowest pairwise percentage identity within a family is less than 20, there are usually bridge pairs that connect two subgroups with an identity of at least 20 percent. For example, the lowest pairwise identity within the globin family is 12 (between Pagothenia bernacchii haemoglobin and Chironomous thummi thummi erythrocruorin), but all the members are single-linked with an identity of at least 21. There are, however, families in which even the percentage identities for bridge pairs are less than 20. We keep these families in the database because there is convincing structural similarity to suggest evolutionary relationships (e.g., cytochrome P450).

The central element of HOMSTRAD is a collection of carefully examined structure-based alignments organised at the level of homologous families. This requires substantial manual editing and is complementary to fully automated structure comparisons such as FSSP. One unique feature of HOMSTRAD is to display the alignments in a specially devised annotated form to help understand the conservation of various structural features. The analysis and annotation is carried out by the program JOY. See the software section for further details of this format. The combination of HOMSTRAD and JOY proved to be particularly useful in achieving accurate alignments for comparative modelling (Burk et al., 1999). This facility, again, contrasts some other database resources such as SCOP (Murzin et al., 1995) and CATH (Orengo et al., 1997), which provide a hierarchical classification of protein structures.

For many of the families, extended alignments are also available. These include sequences from the corresponding Pfam family, which are added to the core alignment using ClustalW. Prosite motifs have also been added to these extended alignments.

The database was originally developed to help in the modelling of homologous protein structures and also in the study of divergent evolution of the structure of proteins. It originates from a small database of seven family alignments (Overington et al., 1990), which has been gradually updated and extended (Overington et al., 1993; Sali & Overington, 1994). Some of the earlier works using the database can be found in the reference section.

[index]

Beginner's guide to Homstrad

What is the Homstrad protein structures database for?

HOMSTRAD (HOMologous STRucture Alignment Database) provides a collection of structure-based alignments of homologous protein sequences. The database is compiled from structures present in the PDB (Protein Data Bank) and these have been grouped into around a 1000 multi-member families at present.

This type of information is useful in two main areas:

We would also like to apply the database to genome wide analyses of protein families. Some initial work on microbial genomes has been carried out and the Drosophila genome will be our next project.

[index]

Determining the structure of a protein

Protein structures can be determined experimentally in a number of different ways including...

These procedures are time consuming and can be problematic. It would be better if you could use bioinformatics approaches, firstly to support experimental approaches (by giving you an idea of secondary structure, domain organization, potential active sites or structurally important residues) and ultimately to reliably predict a protein's structure based solely on its sequence, something that is not possible at present. Advances are being made in structural prediction and a number of approaches are being taken (ab initio, threading – sequence to fold assignment, comparative modeling). Using sequence homology to known structures, i.e. comparative modeling, has been the most successful of these but depends on the availability of high quality sequence alignments, something which Homstrad provides.

[index]

Important databases

There are already a variety of publicly available protein databases on the web, each with a different focus. We use data from a number of these during the construction and maintenance of Homstrad, so a brief introduction to some of them is given here.

Biological databases can range from providing a basic searchable depository of data to more informative setups (often manually curated) that provide information on the relationships between the member entries and supply varying levels of annotation.

Type of database

Sequence database examples

Structure database examples

Primary depository

- Often high level of redundancy

Genbank EMBL DDBJ TrEMBL PDB
Secondary database

- Some annotation added

- Redundancy reduced

SwissProt

-

Classification database

- Hierarchical

ProtoMap SCOP
CATH
Family database

- entries grouped and aligned according to sequence or structure

Pfam Homstrad

The PDB is the main depository for protein structures. It is the starting point for structural databases to build upon and it is updated weekly with the coordinates of every new model that is released. Each protein structure entry (which may contain details for more than one polypeptide chain) is assigned a four character ID starting with a number.

Note that the crystallographers and other experimentalists who submit structures to the PDB sometimes only work with domains and may modify the sequence to aid crystallization etc. This means that you need to map the PDB chain sequence back onto the original SwissProt entry to decide whether you are looking at a whole protein or just a domain and to find out whether it has been modified. Such information can be found in the DBREF and SEQADV fields of the PDB file and, failing that, there are other databases that can help eg 3Dseq at the EBI.

Both the PDB and its sequence equivalents (Genbank/EMBL/DDBJ) have to deal with large amounts of data that is donated by the scientific community. Because of this, there tends to be a lot of variation in the quality of annotation provided and a certain amount of redundancy.

For protein sequence, there is a secondary database, SwissProt, which is non-redundant and more consistently annotated. This is the best place to look for native protein sequences and associated annotation. Failing this, you will need to look either at TrEMBL (automatically compiled database of protein sequences not yet added to SwissProt) or at the primary Genbank/EMBL/DDBJ entries.

The next levels of information processing often involve manual inspection of the data and expert input. The way in which the data is organized and presented can vary greatly.

Some databases have a hierarchical structure and can classify their entries in different ways eg...

SCOP (Structural classification of proteins)...

CATH...

These two databases are, in part, manually compiled so it may take a little while to be updated after the structures are released to the PDB - there are fully automated databases of structural classification around (eg FSSP) updating regularly from new PDB structures but these are less informative.

Databases can also contain entries that have been grouped according to sequence or structure similarity.

PFAM clusters homologous regions of sequences into families using sensitive Hidden Markov Model based methods (HMM) to indicates common sequence domains. It does not pay much attention to structural information in generating the alignments (often no coordinates for a protein are available) and the sequence domains do not necessarily correlated with clear structural domains, but it is well maintained and comprehensive. Pfam keeps a separate family for each domain, therefore, some families include the whole protein length but many multi domain proteins have different regions included in different families. For the main (Pfam-A) part of the database, the main alignments (seed alignments) are manually checked and corrected, however, the full alignments are computer generated and are less reliable. Other motifs are held as Pfam-B families, these are automatically clustered sequences derived from the ProDom database, but tend to be less well annotated and less useful.

Pfam is now linked into a 'hub' database called Interpro that shares annotation between a number of databases (Pfam, PRINTS, PROSITE, ProDom, SMART, SWISS_PROT + TrEMBL) and is a useful starting point when investigating proteins.

Homstrad is the structural equivalent of Pfam. Like other higher order databases, it relies heavily on data from the other depositories mentioned here. Primary protein structures are taken from the PDB, candidate families are routinely identified by searching Pfam, SCOP structural domain definitions are used and information on the native proteins is collected from SwissProt, Pfam and Interpro.

[index]

What is Homstrad ?

Homstrad is a web accessible database that contains families of proteins of known structure that share sequence/structural similarity.

New PDB entries are automatically processed and added as 'single member families' on a weekly basis although it may take longer for them to be incorporated into a main family alignment since this requires significant manual input.

The main families are composed of representative members only -When the family is first made, all the potential new sequences are compared and if any are more than 90% similar at the sequence level they are grouped together and the structure with the best resolution is selected as a representative. These representatives are what you see on the web page although you can view the full list of PDB chains considered by clicking on 'show related pdbs'. Note that Homstrad does not include theoretical models (these may be inaccurate), or structures containing only C-alpha coordinates (no environment can be assigned) and X-ray structures are preferred over NMR structures.

Each family has its own web page. On each page, some basic information is given in the top fields, then details for the individual PDB chain entries are listed, followed by some download links, including a pir formated version of the sequence alignment, the alignment in ali format and also the malform output for the family (percentage sequence identities). Then there is a list of links to Pfam and other databases (links related to individual PDB chains - SwissProt, SCOP, CATH, FSSP, RCBI and EBI etc. - are available by clicking on each PDB code). Other related information is provided below this, and finally, a line up of the sequences is displayed.

The line-ups and associated annotation are a distinguishing feature of the database. These are based, not only on sequence similarity and type of amino acid, but also on the environment of each amino acid - its solvent accessibility, what type of secondary structure etc. Annotation is provided using a program called Joy, which highlights a number of features including:

          See the Joy home page and manual for additional information on Joy and its alignment format.

The protein structures themselves can be viewed, singly or with the whole family superimposed on one another, using Rasmol - see below for notes on setting this up. Once you have configured your web browser, click on the symbols next to each PDB code to see the individual structures or click on 'Rasmol' for the superposition.

In a number of cases, the Homstrad family page provides additional information (these families are part of what is known as Homstradplus)...

[index]

Searching Homstrad

Homstrad families can be searched with a keyword or with an amino acid sequence or multiple alignment.

Keyword searches can be done from the home page or the Homstrad search page. If you are interested in, for example, EGF proteins, type 'EGF' (make sure there are no spaces on either side) and a list of families containing that term in their description line will be returned. If you have a particular structure in mind and you know the PDB code, type that in and you will get a list of families that include either any of the chains detailed in that PDB entry or representatives of them (ie another PDB chain that shares more than 90% sequence identity). Searching for a PDB chain in this way may also give you access to single member families - these family names start with 'hs' or 'hsd' followed by the PDB code and in some instances a chain code letter (a,b,c etc) and perhaps a number to label the domain. These may not have been incorporated into a main family yet, either because the entry has not been fully processed by us, or because it shows no sequence or structure similarity to any other PDB chains already held in Homstrad. Such entries may or may not have been manually inspected and annotated but following links to Pfam etc can provide you with additional information.

You can also browse the families arranged alphabetically or structurally (click on the browse botton on the home page). The structural classes are broadly based on SCOP's structural classes (although there are some differences) and protein families are assigned to these groups manually.

The classes are...

If you want to compare an amino acid sequence to existing protein families in Homstrad, one option is to do a quick blast search by either pasting your sequence (Fasta format) into the search form provided, or by specifying a PDB code and chain identifier in the form. So the results for the PDB chain, 5grt, which has the sequence...

>5grt_  mol:protein length:461     Glutathione Reductase

VASYDYLVIGGGSGGLESAWRAAELGARAAVVESHKLGGTCVNVGCVPKK
VMWNTAVHSEFMHDHADYGFPSCEGKFNWRVIKEKRDAYVSRLNAIYQNN
LTKSHIEIIRGHAAFTSDPKPTIEVSGKKYTAPHILIATGGMPSTPHESQ
IPGASLGITSDGFFQLEELPGRSVIVGAGYIAVEMAGILSALGSKTSLMI
RHDKVLRSFDSMISTNCTEELENAGVEVLKFSQVKEVKKTLSGLEVSMVT
AVPGRLPVMTMIPDVDCLLWAIGRVPNTKDLSLNKLGIQTDDKGHIIVDE
FQNTNVKGIYAVGDVCGKALLTPVAIAAGRKLAHRLFEYKEDSKLDYNNI
PTVVFSHPPIGTVGLTEDEAIHKYGIENVKTYSTSFTPMYHAVTKRKTKC
VMKMVCANKEEKVVGIHMQGLGCDEMLQGFAVAVKMGATKADFDNTVAIH
PTSSEELVTLR

...are as follows (not all of the alignments are shown to save space) - note that hits to both the main Homstrad families and to single-member families are detailed:

BLAST interface to HOMSTRAD

5grt_

Searching families with alignment
Searching single-member families

Searching families with alignment

BLASTP 2.2.1 [Apr-13-2001]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

Query= 5grt_  mol:protein length:461   Glutathione Reductase (461 letters)
Database: homstrad 3003 sequences; 627,733 total letters
Searching.......done                                            
  Score E
Sequences producing significant alignments: (bits) Value
3grs (grs)922 0.0
1gera (grs)460 e-131
2tpra (grs)265 2e-72
1ndaa (grs)261 4e-71
3lada (grs)178 2e-46
1ebda (grs)175 2e-45
1lpfa (grs)171 3e-44
1lvl (grs)141 4e-35
1ojt (grs)134 5e-33
1trb (grs)46 2e-06
1npx (grs)40 9e-05
1chua (FAD_binding_2) 28 0.50
1chua (FAD_binding_2) 28 0.50

>3grs_grs
          Length = 461
Score =  922 bits (2382), Expect = 0.0
Identities = 459/461 (99%), Positives = 459/461 (99%)
Query: 1   VASYDYLVIGGGSGGLESAWRAAELGARAAVVESHKLGGTCVNVGCVPKKVMWNTAVHSE 60
           VASYDYLVIGGGSGGL SA RAAELGARAAVVESHKLGGTCVNVGCVPKKVMWNTAVHSE
Sbjct: 1   VASYDYLVIGGGSGGLASARRAAELGARAAVVESHKLGGTCVNVGCVPKKVMWNTAVHSE 60

Query: 61  FMHDHADYGFPSCEGKFNWRVIKEKRDAYVSRLNAIYQNNLTKSHIEIIRGHAAFTSDPK 120
           FMHDHADYGFPSCEGKFNWRVIKEKRDAYVSRLNAIYQNNLTKSHIEIIRGHAAFTSDPK
Sbjct: 61  FMHDHADYGFPSCEGKFNWRVIKEKRDAYVSRLNAIYQNNLTKSHIEIIRGHAAFTSDPK 120
.
.
.
.

>1gera_grs
          Length = 448
Score =  460 bits (1184), Expect = e-131
Query: 4   YDYLVIGGGSGGLESAWRAAELGARAAVVESHKLGGTCVNVGCVPKKVMWNTAVHSEFMH 63
           YDY+ IGGGSGG+ S  RAA  G + A++E+ +LGGTCVNVGCVPKKVMW+ A   E +H
   YDYIAIGGGSGGIASINRAAMYGQKCALIEAKELGGTCVNVGCVPKKVMWHAAQIREAIH 62

Query: 64  DHA-DYGFPSCEGKFNWRVIKEKRDAYVSRLNAIYQNNLTKSHIEIIRGHAAFTSDPKPT 122
            +  DYGF +   KFNW  +   R AY+ R++  Y+N L K+++++I+G A F  D K T
Sbjct: 63  MYGPDYGFDTTINKFNWETLIASRTAYIDRIHTSYENVLGKNNVDVIKGFARFV-DAK-T 120
.
.
.
.

>1chua_FAD_binding_2
          Length = 478
Score = 28.1 bits (61), Expect = 0.50
Identities = 10/22 (45%), Positives = 18/22 (81%)

Query: 294 GHIIVDEFQNTNVKGIYAVGDV 315
           G ++VD+   T+V+G+YA+G+V
Sbjct: 303 GGVMVDDHGRTDVEGLYAIGEV 324

Database: homstrad
    Posted date:  Sep 21, 2001  5:31 PM
Number of letters in database: 627,733
Number of sequences in database: 3003
Lambda     K      H
   0.318    0.134    0.395
Lambda     K      H
   0.267   0.0410    0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 596,375
Number of Sequences: 3003
Number of extensions: 25079
Number of successful extensions: 151
Number of sequences better than 1.0: 13
Number of HSP's better than 1.0 without gapping: 11
Number of HSP's successfully gapped in prelim test: 2
Number of HSP's that attempted gapping in prelim test: 86
Number of HSP's gapped (non-prelim): 17
length of query: 461
length of database: 627,733
effective HSP length: 81
effective length of query: 380
effective length of database: 384,490
effective search space: 146106200
effective search space used: 146106200
T: 11
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 59 (27.3 bits)

Searching single-member families

BLASTP 2.2.1 [Apr-13-2001]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

Query= 5grt_  mol:protein length:461  Glutathione Reductase (461 letters)
Database: HOMS 2341 sequences; 410,612 total letters

Searching.....done
       
  Score E
Sequences producing significant alignments: (bits) Value
hs1d7ya crystal structure of nadh-dependent ferredoxin reductase..32 0.015
hs1hyua crystal structure of intact ahpf29 0.15

>hs1d7ya crystal structure of nadh-dependent ferredoxin reductase,
           bpha4
          Length = 401
Score = 32.3 bits (72), Expect = 0.015
Identities = 53/223 (23%), Positives = 90/223 (39%), Gaps = 35/223 (15%)

Query: 106 IEIIRGHAAFTSDPKPTIEVSGKKYTAPH--ILIATGGMPS---TPHESQIPGASLGITS 160
           +E + G  A + DP+          T P+  +++ATG  P    T   + +P  +L
Sbjct: 70  VEWLLGVTAQSFDPQAHTVALSDGRTLPYGTLVLATGAAPRALPTLQGATMPVHTLRTLE 129
.
.
.
.

Database: HOMS
Posted date: Sep 30, 2001 11:30 AM
Number of letters in database: 410,612
Number of sequences in database: 2341
Lambda     K      H
   0.318    0.134    0.395
Lambda     K      H
   0.267   0.0410    0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 378,534
Number of Sequences: 2341
Number of extensions: 15326
Number of successful extensions: 51
Number of sequences better than 1.0: 4
Number of HSP's better than 1.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 2
Number of HSP's that attempted gapping in prelim test: 45
Number of HSP's gapped (non-prelim): 8
length of query: 461
length of database: 410,612
effective HSP length: 77
effective length of query: 384
effective length of database: 230,355
effective search space: 88456320
effective search space used: 88456320
T: 11
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 57 (26.6 bits)

This tells you that a representative of 5grt (3grs – it looks identical) is in the family grs along with several other homologous PDB chains and you can click on the family to take you to its home page. Here you find that the family description is 'pyridine nucleotide-disulphide oxidoreductases class-I'. 11 structures are part of the family that share an average sequence identity of 30%, there are 2 links to Pfam and Homstradplus information is available. The weak hit to the FAD_binding_2 family turns out to be over only a 22 amino acid motif and may be worth following up as a functionally important region. Two single-member families are detected. Despite their lower similarity to 5grt than other members of the grs family (look at the E value – a smaller E value means higher similarity), both share a Pfam link with the grs family so they may at some point be added to this family if the structures are not too dissimilar. To get to their home pages either click on the hs1d7ya/hs1hyua links or type 1d7y or 1hyu (the PDB codes) in the Homstrad keyword search box.

Fugue can be used to generate more accurate sequence/structure based alignments between your sequence and the families identified in the Blast search. To do this, click 'align' in the top left hand corner of the family page and submit your query sequence. If you want to search Homstrad using Fugue (and not just align sequences) for detecting more distant homologs, click on ' Fugue' on the Homstrad home page, which takes you to the Fugue search page. Fugue operates using environment-specific substitution tables and structure-dependent gap penalties. Environment-specific substitution tables have been derived from selected Homstrad families dictating the degree of conservation for each type of amino acid, i.e. if it is buried and in an alpha helix it is more likely to be conserved that if it is in a surface loop. Then, for each Homstrad family, a scoring matrix (profile) has been calculated using these substitution tables. Fugue lines the new sequence up to the existing family according to this scoring matrix and allows you to
a) identify distant homologies that may not be identified in other ways.
b) give some indication as to the shape the new protein may adopt, functional sites etc. as a guide for bench experimental work.
To run the search, either upload a sequence file or paste your amino acid sequence (in FASTA or pure amino acid format) into the box provided. By default, other homologous sequences are collected using PSI-Blast and lined up with the corresponding Homstrad family and your query sequence. If you wish, you can submit your own multiple sequence alignment as the input (FASTA / NBRF / CLUSTAL / MSF format). Results are sent out by e-mail.
So for the 5grt example you get the following e-mail response...

########################################################################
#  FUGUE  v1.s.16 (JAN 2001)
# Search sequence(s) against fold library using environment-specific
# substitution tables and structure-dependent gap penalties.
#
# Fold library and substitution tables are based on the HOMSTRAD database.
# http://mizuguchilab.org/homstrad/
#
#  FUGUE server is available at:
#     http://mizuguchilab.org/fugue/
#     http://mizuguchilab.org/fugue/prfsearch.html
# Citation:   J. Shi, T. L. Blundell and K. Mizuguchi (manuscript in preparation)
# Size of fold library: 3157
# Probe sequence ID   : 5grt
# Probe sequence len  : 461
# Probe divergence    : 0.657
# Recommended cutoff  : ZSCORE >=    6.0  (CERTAIN   99% confidence)
# Other cutoff        : ZSCORE >=    5.0  (LIKELY    95% confidence)
# Other cutoff        : ZSCORE >=    4.7  (MARGINAL  90% confidence)
# Other cutoff        : ZSCORE >=    3.5  (GUESS     50% confidence)
# Other cutoff        : ZSCORE <     3.5  (UNCERTAIN)
#
# PLEN   : Profile length
# RAWS   : Raw alignment score
# RVN    : (Raw score)-(Raw score for NULL model)
# ZSCORE : Z-score normalized by sequence divergence
# PVZ    : P-value based on Z-score jumbling        (Currently Disabled)
# ZORI   : Original Z-score (before normalization)
# EVP    : E-value based on profile calibration     (Currently Disabled)
# EVF    : E-value based on library search          (Currently Disabled)
# AL     : Alignment algorithm used for Zscore/Alignment calculation
#           0 -- Global, 2 -- GloLocSeq (No sequence termini gap penalty)
#           3 -- GloLocPrf (No profile termini gap penalty)
#--------------------------------------------------------------------------
# Profile        PLEN  RAWS RVN  ZSCORE   PVZ     ZORI    EVP     EVF   AL
#------------------------------------------------------------------------
grs              558   610 1282  65.75 1.0E+03   67.06 1.0E+03 1.0E+03 00
hs1d7ya          401    20 404   15.47 1.0E+03   16.79 1.0E+03 1.0E+03 00
hsd2tmda3        233    -5 146   10.36 1.0E+03   11.67 1.0E+03 1.0E+03 22
FAD_binding_2    590  -477 226    8.72 1.0E+03   10.04 1.0E+03 1.0E+03 00
cox              661  -737 124    7.48 1.0E+03    8.80 1.0E+03 1.0E+03 00
hs1b3ma          385  -261 224    7.31 1.0E+03    8.63 1.0E+03 1.0E+03 00
AlaDh_PNT        397  -303 108    4.83 1.0E+03    6.14 1.0E+03 1.0E+03 00
hsd1cjca2        230   -79  79    4.63 1.0E+03    5.94 1.0E+03 1.0E+03 22
hsd1fcda1        186  -213  79    4.56 1.0E+03    5.88 1.0E+03 1.0E+03 02
hs1qlaa          655  -593  95    4.32 1.0E+03    5.64 1.0E+03 1.0E+03 00

If you go to the results website you get more information and access to the alignments...

FUGUE v1.s.16 (JAN 2001)

Search sequence(s) against fold library using environment-specific substitution tables and structure-dependent gap penalties.
Fold library and substitution tables are based on the HOMSTRAD database.
        http://mizuguchilab.org/homstrad/
FUGUE server is available at:
        http://mizuguchilab.org/fugue/
        http://mizuguchilab.org/fugue/prfsearch.html
Citation:     J. Shi, T. L. Blundell and K. Mizuguchi
             (manuscript in preparation)

Size of fold library: 3157
Probe sequence ID   : 5grt
Probe sequence len  : 461
Probe divergence    : 0.657
Recommended cutoff  : ZSCORE >=    6.0  (CERTAIN   99% confidence)
Other cutoff        : ZSCORE >=    5.0  (LIKELY    95% confidence)
Other cutoff        : ZSCORE >=    4.7  (MARGINAL  90% confidence)
Other cutoff        : ZSCORE >=    3.5  (GUESS     50% confidence)
Other cutoff        : ZSCORE <     3.5  (UNCERTAIN)

PLEN   : Profile length
RAWS   : Raw alignment score
RVN    : (Raw score)-(Raw score for NULL model)
ZSCORE : Z-score normalized by sequence divergence
ZORI   : Original Z-score (before normalization)
AL     : Alignment algorithm used for Zscore/Alignment calculation
          0 -- Global, 2 -- GloLocSeq (No sequence termini gap penalty)
          3 -- GloLocPrf (No profile termini gap penalty)
The sequence(s) you submitted is HERE (in original format).
The sequence(s) actually used by FUGUE is HERE (in PIR format).
Download all the results in compressed format HERE.  new!


View Ranking (Click on a profile hit will bring you to the corresponding HOMSTRAD family)

                                       
Profile Hit PEN RAWS RVN ZSCORE ZORI AL    
grs558 610 1282 65.75 67.06 00 CERTAIN Alignment
hs1d7ya401 20 404 15.47 16.79 00 CERTAIN Alignment
hsd2tmda3233 -5 146 10.36 11.67 22 CERTAIN Alignment
FAD_binding_2590 -477 226 8.72 10.04 00 CERTAIN Alignment
cox 661 -737 1247.48 8.80 00 CERTAIN Alignment
hs1b3ma 385 -261 2247.31 8.63 00 CERTAIN Alignment
AlaDh_PNT 397 -303 1084.83 6.14 00 MARGINAL Alignment
hsd1cjca2 230 -79 794.63 5.94 22 GUESS Alignment
hsd1fcda1 186 -213 794.56 5.88 02 GUESS Alignment
hs1qlaa 655 -593 954.32 5.64 00 GUESS Alignment

View Alignments (Keys [aa,ma,mh,hh])

Hint: check 'ma' first if your query is a single sequence, otherwise start with 'aa'.

                                       
Profile Hit HTML POSTSCRIPT TEXT (PIR FORMAT)      
grsaa ma mh hh aa ma mh hh aa ma mh hh CERTAIN Model 65.75
hs1d7yaaa ma mh hh aa ma mh hh aa ma mh hh CERTAIN Model 15.47
hsd2tmda3aa ma mh hh aa ma mh hh aa ma mh hh CERTAIN Model 10.36
FAD_binding_2aa ma mh hh aa ma mh hh aa ma mh hh CERTAIN Model 8.72
coxaa ma mh hh aa ma mh hh aa ma mh hh CERTAIN Model 7.48
hs1b3maaa ma mh hh aa ma mh hh aa ma mh hh CERTAIN Model 7.31
AlaDh_PNTaa ma mh hh aa ma mh hh aa ma mh hh MARGINAL Model 4.83
hsd1cjca2aa ma mh hh aa ma mh hh aa ma mh hh GUESS Model 4.63
hsd1fcda1aa ma mh hh aa ma mh hh aa ma mh hh GUESS Model 4.56
hs1qlaaaa ma mh hh aa ma mh hh aa ma mh hh GUESS Model 4.32

Keys
aa: query sequences (including PSI-BLAST homologues) aligned against all the representative structures from a HOMSTRAD family
ma: master sequence aligned against all the representative structures from a HOMSTRAD family
mh: master sequence aligned against a single structure of highest sequence identity from a HOMSTRAD family
hh: single sequence/structure pair with highest sequence identity in 'aa'
Note: If your query is a single sequence, master sequence is equivelent to your query and all the other sequences (if any) are collected by PSI-BLAST. If your query is a sequence alignment, master sequence is set to the first sequence in the alignment.
JOY Keys are described here.

In this instance, Fugue has identified a number of hits not found using the simple Blast search and, additionally, the confidence level of these hits is given in simple terms – a translation of the Z score. Generally it is best to consider just the 'CERTAIN' hits although 'LIKELY' hits (none have been classified this in the current example) may be relevant. To view the multiple alignments, in the 'view alignments' section, first click on the HTML 'ma' buttons to see your sequence lined up with each of the Homstrad family alignments. Then click the 'aa' buttons, which gives you the lineup including all the additional homologs collected by PSI-Blast. The 'aa' alignments are additionally annotated to highlight different types of amino acid using 'Taylor' notation. Fugue also creates a rough model for your query sequence based on the backbone coordinates of the most significant hit, although in this instance the query sequence's structure has already been determined.

[index]

Examples...

There are a number of published examples of how Homstrad has been used including...

Nunez Miguel et al. (2001) 'Protein Fold Recognition and Comparative Modelling using Homstrad, Joy and Fugue' (This book chapter is in Press – [local link]).

Shirai et al. (2001) A novel superfamily of enzymes that catalyze the modification of guanidine groups. TIBS 26 (8) 465-468

Parker et al. (2001) A family of proteins related to Spätzle, the toll receptor ligand, are encoded in the Drosophila genome. Proteins.

[index]

A final word...

As with all higher order protein databases there are different ways to organize the data and different places to specify cut-offs.

At present, the cut off point at which we define a protein as being homologous to another protein is variable. Sometimes, whether a protein is added to a family depends as much on whether the structures look similar as on sequence similarity (although there must be some implied evolutionary relationship between the two proteins to justify the term 'homology').

We occasionally have problems deciding whether a protein should belong to one family or another if a continuous chain of homology has developed between them. Sometimes we decide that the families should be merged into one and in other cases we leave them separate. At present what we do tends to be guided by PFAM and SCOP but the decision is ultimately arbitrary and is in no way the final word on protein grouping.

Finally, you will notice that membrane spanning protein domains are significantly under-represented in Homstrad and other protein structure databases, despite their great importance to researchers (as key regulators of signaling pathways, drug targets etc). This we can do nothing about, but the situation should improve in the near future as experimental techniques improve.

[index]

References

Current reference

When using the current version of HOMSTRAD, please cite:

Mizuguchi, K., Deane, C.M., Blundell, T.L. & Overington, J.P., (1998). HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci., 7, 2469-2471.

Original work on the alignment database

J.P. Overington, M.S. Johnson, A. Sali, and T.L. Blundell, (1990), Tertiary structural constraints on protein evolutionary diversity; Templates, key residues and structure prediction, Proc. Roy. Soc. Lond., B 241, pp. 132-145.

J.P. Overington, M. Johnson, C. Topham, A. McLeod, A. Sali, Z.-Y. Zhu, L. Sibanda, T. Blundell, (1990), Applications of environment specific amino acid substitution tables to identification of key residues in protein tertiary structure, Curr. Sci., 59, pp. 867-874.

J.P. Overington, D. Donnelly, M.S. Johnson, A. Sali, and T.L. Blundell,(1991), Environment specific amino acid substitution tables: Tertiary templates and prediction of protein folds, Protein Science, 1, pp. 216-226.

A. Sali, and J.P. Overington (1994), Derivation of rules for comparative protein modelling from a database of protein structure alignments Prot. Sci., 3, pp. 1582-1596.

Early related work on the alignment database

A. Sali, J.P. Overington, M.S. Johnson, and T.L. Blundell, (1990), From comparisons of protein sequences and structures to protein modelling and design, Trends Biochem. Sci., 15, pp. 235-240. (Also published in: Proteins: Form and Function, (1990), ed. R. Bradshaw and M. Purton, Elsevier Trends Journals, Cambridge, pp. 163-171).

A.-M. Hoffren, M. Saloheimo, P. Thomas, J.P. Overington, M.S. Johnson, J.K.C. Knowles, and T.L. Blundell, (1991), Modelling the lignin peroxidase LIII of Phlebia radiata using a knowledge-based approach, J. Chim. Phys., 88, pp. 2659-2662.

M.S. Johnson, J.P. Overington, and T.L. Blundell, (1993), Alignment and searching for common protein folds using a data bank of structural templates, J. Mol. Biol., 231, pp. 735-752

J.P. Overington, Z.-Y. Zhu, A. Sali, M.S. Johnson, R. Sowdhamini, G.V. Louie, and T.L. Blundell, (1993), Molecular recognition in protein families: A database of aligned three-dimensional structures of related proteins, Biochem. Soc. Trans., 21, pp. 597-604.

C.M. Topham, A. McLeod, F. Eisenmenger, J.P. Overington, M.S. Johnson, and T.L. Blundell, (1993), Fragment ranking in modelling of protein structure: Conformationally constrained environmental amino acid substitution tables, J. Mol. Biol., 229, pp. 194-220.

A. Sali, and T.L. Blundell, (1993), Comparative protein modelling by satisfaction of spatial restraints, J. Mol. Biol., 234, pp. 779-815.

Recent applications and related work

Mizuguchi, K., Parker, J.S., Blundell, T.L. & Gay, N.J., (1998). Getting knotted: a model for the structure and activation of Spätzle. Trends Biochem. Sci., 23, 239-242.

Mizuguchi, K., Deane, C.M., Johnson, M.S., Blundell, T.L. & Overington, J.P., (1998). JOY: protein sequence-structure representation and analysis. Bioinformatics, 14, 617-623.

Sowdhamini, R., Burke, D.F., Huang, J.-F., Mizuguchi, K., Nagarajaram, H.A., Srinivasan, N., Steward, R.E. & Blundell, T.L., (1998). CAMPASS: A database of structurally aligned protein superfamilies. Structure, 6, 1087-1094.

Sowdhamini, R., Burke, D.F., Deane, C.M. Huang, J.-F., Mizuguchi, K., Nagarajaram, H.A, Overington, J.P., Srinivasan, N., Steward, R.E. & Blundell, T.L. (1998). Protein 3D structural databases: domains, structurally aligned homologues and superfamilies. Acta Cryst. D54, 1168-1177.

Marino-Buslje, C., Mizuguchi, K., Siddle, K. & Blundell, T.L., (1998). A third fibronectine type III domain in the extracellular region of the insulin receptor family. FEBS letters, 441, 331-6.

Sowdhamini, R., Burke, D.F., Deane, C., Huang, J.F., Mizuguchi, K., Nagarajaram, H.A., Overington, J.P., Srinivasan, N., Steward, R.E., Blundell, T.L. (1998) Protein three-dimensional structural databases: domains, structurally aligned homologues and superfamilies. Acta Crystallogr D Biol Crystallogr. 54, 1168-77.

Burke, D., Deane, C.M., Nagarajaram, H.A., Campillo, N., Martin-Martinez, M., Mendes, J., Molina, F., Perry, J., Reddy, B.V.B., Soares, C.M., Steward, R.E., Williams, M., Carrondo, M.A., Blundell, T.L., Mizuguchi, K. (1999) An iterative structure-assisted approach to sequence alignment and comparative modeling. Proteins, suppl 3, 55-60.

Blundell, T.L., Mizuguchi, K. (2000) Structural genomics: an overview. Prog Biophys Mol Biol. 73, 289-95.

Shi, J., Blundell, T.L., Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol. 310, 243-57.

de Bakker, P.I., Bateman, A., Burke, D.F., Miguel, R.N., Mizuguchi, K., Shi, J., Shirai, H., Blundell, T.L. (2001) HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families. Bioinformatics, 17, 748-9.

Nunez Miguel R., Shi, J., Mizuguchi, K. (2001) Protein Fold Recognition and Comparative Modelling using Homstrad, Joy and Fugue (This book chapter is in Press – [local link]).

Shirai, H., Blundell, T.L., Mizuguchi, K. (2001) A novel superfamily of enzymes that catalyze the modification of guanidine groups. TIBS, 26, 465-468.

Parker, J.S., Mizuguchi, K., Gay, N.J. (2001) A family of proteins related to Spätzle, the toll receptor ligand, are encoded in the Drosophila genome. Proteins, 45, 71-80.

[index]

Viewing structures with Rasmol

First install Rasmol on your system.

1. Viewing the aligned region of individual structures rasmol

By clicking this icon, you can display the entire asymmetric unit, with the regions that are included in the alignment shown in thick lines. You don't need a local copy of PDB but you have to set up rasmol as a helper application as described in the next section.

2. Viewing superposed structures

The superimposed structure for each family can be saved in a PDB file by clicking the link "superimposed coordinates", or visualized directly by clicking the link "RasMol". Configure your browser as follows:

Netscape on UNIX

  1. Edit -> Preferences -> Navigator -> Applications -> New
  2. Description: rasmolscript file
  3. File extension:
  4. MIME Type: application/x-rasmol
  5. Check the Application button, click Choose... and select your copy of Rasmol then add "-script %s" at the end.
    (e.g., on SGI it should look like
    xwsh -e /usr/local/bin/rasmol -script %s)

3. Viewing other links to PDB files

To launch Rasmol automatically when you click the link to a PDB file, configure your browser as follows:

Netscape on PC

  1. Edit -> Preferences -> Navigator -> Applications -> New Type
  2. Description of type: PDB file
  3. File extension: pdb
  4. MIME Type: chemical/x-pdb
  5. Application to use: click browse and select your copy of Rasmol then add %1 at the end
    (e.g., C:\Rasmol\Raswin.exe %1)

Netscape on UNIX

  1. Edit -> Preferences -> Navigator -> Applications -> New
  2. Description of type: PDB file
  3. File extension: pdb
  4. MIME Type: chemical/x-pdb
  5. Check the Application button, click Choose... and select your copy of Rasmol then add %s at the end
    (e.g., on SGI it should look like
    xwsh -e /usr/local/bin/rasmol %s)

Internet Explorer

  1. Run Windows Explorer
  2. View -> Folder Options... -> File Types -> New Types
  3. Description of type: PDB file
  4. Associated extension: pdb
  5. MIME Type: chemical/x-pdb
  6. Actions: -> New...
  7. Actions: open
  8. Browse... -> select your copy of Rasmol
  9. OK
[index]

Linking to HOMSTRAD

The best way to make stable links to HOMSTRAD alignments is to use the search engine:
http://mizuguchilab.org/cgi-bin/homstrad/homstrad.cgi?family=name
where name is ATP-synt, adh, etc. The PDB code search in the form of
http://mizuguchilab.org/cgi-bin/homstrad/homstrad.cgi?pdb=1aab
is now available.
[index]

Contact us at:

homstrad@mizuguchilab.org


Acknowledgments

We thank Andrej Sali and Mark Johnson for the development of the original alignment database from which HOMSTRAD has evolved. JPO thanks Pfizer Ltd for their support of this project.


Copyright (C) 1997-2015 The HOMSTRAD authors homstrad@mizuguchilab.org

Created: 3 Sep 1999
Last modified: Tue Jun 30 13:39:45 JST 2015