Documentation top | FUGUE Home ]


Notes for the use of command-line version

See the download information for obtaining the command-line version.

Available programs

Core programs

fugueseq
Search a database of structural profiles with a single sequence or a multiple sequence alignment
fugueali
Align a sequence or a multiple sequence alignment against a structural profile
fugueprf
Search a sequence database with a structural profile (experimental)
melody
Profile maker for FUGUE
chorus
Multiple sequence alignment version of fugueali (experimental)

Support programs

run_fugue
Perform automatic database search (first run PSI-BLAST to collect homologues of the query sequence and then run fugueseq)
homblast
Add homologues to a structure-based alignment by running PSI-BLAST
run_blast
PSI-BLAST wrapper, to add homologues to a sequence
blastmap
Combine PSI-BLAST alignments (called by homblast and not normally used on its own but can be useful)
joy
Define structural environments (available as a separate package - see see http://mizuguchilab.org/joy/).

Basic features

Advanced features

Other notes


How to create your own profiles

1. Before you start

Default profiles for all HOMSTRAD families can be downloaded from ftp://mizuguchilab.org/homstrad/data/. For example, try looking at ftp://mizuguchilab.org/homstrad/data/PAS/PAS.fug. This profile can be reproduced by downloading PAS.tem and PAS.map from the same directory and typing
 melody -blast -t PAS.tem -plus PAS.map

Typing

 fugueali -seq input.fa -prf PAS.fug
should produce a sequence-structure alignment identical to that from the alignment server.

You can also download the default list of profiles from ftp://mizuguchilab.org/software/fugue/data/allprf.lst. Each line in this file specifies the location of a structural profile.

If PAS.fug and the other profiles are locally stored as in allprf.lst and if you have PSI-BLAST and seals installed locally, the following command will produce results more or less identical to those obtained from the web interface:

 run_fugue -seq input.fa -list allprf.lst

This will create the output file fugue.html. See How to interpret the output page on how to read this file.

If you want to create your own profiles, go through the following steps. (You could then add your own profiles to allprf.lst; see below.)

2. Prepare a structure or a structure-based alignment

If you have a single structure saved in the PDB file mystructure.pdb, simply type
 joy mystructure.pdb
If you want to use a particular chain (e.g., A) of a particular structure which is already in PDB (e.g., 1abc), you don't have to prepare a PDB file. You can simply type
 joy 1abcA
(For this to work, you have to have a local copy of PDB and set the environment variable JOY_PDBDIR.)

If you have prepared a structural alignment, save it in myfamily.ali and type

 joy myfamily.ali
You need the PDB files for all the structures in the alignment (see here for more information). At this stage, do not include sequence-only entries in the .ali file. Make sure JOY has finished normally and created the file myfamily.tem. As an example, look at the PAS.ali file in the above ftp directory, from which PAS.tem can be created. (You need to download all the .psa, .hdb, .sst and .cof files.)

3. Collect homologous sequences

This step is optional but recommended. The easiest way to run PSI-BLAST and produce a multiple sequence alignment is to use the command homblast:
 homblast -seq myfamily.ali
This will create the file myfamily.map. Note that this command may take a long time if you have many structures in your .ali file and they are large proteins and/or have many homologues.

(Note) This command will create a directory named blast in which psi-blast is run (up to five iterations with the default inclusion cut-off). Once the .map file has been created properly, you can delete the whole directory. In fact, homblast will not run if a directory named blast exists so to rerun the command, you need to delete this directory.

4. Run MELODY

Type
 melody -t myfamily.tem [ -blast -plus myfamily.map ]

This will create the file myfamily.fug, which is a structural profile that FUGUE requires.

To enrich the profile with sequence information, use the option -plus myfamily.map, where the .map file was produced in step 3 above. If this file was created by homblast (as above), you should also add the option -blast. This assumes that the quality of the alignment in the .map file is not very high and several filters are applied accordingly.

Alternatively, you can specify your own multiple sequence alignment saved in a PIR file. In this case, you may omit the -blast option.


How to align a sequence (or sequence alignment) against your own structure or structural alignment

1. Prepare your structure

Run JOY and MELODY as described above.

2. Prepare your sequence

2.1. Save your sequence in a file

If you already have an alignment (e.g., by clustalw) you can use it (in clustalw, pir or msf format).

2.2. Collect homologues

If you don't have an alignment, this step is strongly recommended. Type
 run_blast -seq myseq.fa
This will create a multiple alignment in the file myseq.inp.

3. Run fugueali

Type
 fugueali -seq myseq.inp -prf myfamily.fug [ -joy -blast ]
For more information, type
 fugueali -t

How to perform a search against a library including your own profiles

1. Prepare your own profiles as described above

2. Create the file myprofiles.lst

It should look like:
/my_directory/myprofileA.fug
/my_directory/myprofileB.fug
/my_directory/myprofileB.fug
A default list of profiles can be downloaded from ftp://mizuguchilab.org/software/fugue/data/allprf.lst. You can modify this and add your own profiles.

3. Run fugue

Type
 run_fugue -seq myseq.fa -list myprofiles.lst
or if you already have an alignment, type
 fugueseq -seq myali.aln -list myprofiles.lst

How to refine your sequence-structure alignment

(This involves the use of the programs fugueprf and chorus, which are included in the FUGUE package as experimental versions and may produce some odd results. However, the basic idea described here should work generally.)

1. Create a family profile (.fug)

As above, starting from a reliable structure-based alignment, you can create a structural profile (mystr.fug).

2. Select homologous sequences to add

The easiest way to add homologous sequences is to run PSI-BAST through the homblast command mentioned above.  However, the resulting sequence-structure alignment (mystr.map) is based on the PSI-BLAST alignment and may not be optimal. You could select homologous sequences using fugue itself.  If you have a small number of candidate sequences that are known members of this family, you can examine them by running fugueseq individually:
 fugueseq -seq candidate1.fa -prf mystr.fug
fugueseq -seq candidate2.fa -prf mystr.fug
...
Examine the Z-scores and alignment quality. Alternatively, you can save all these candidate sequences in a single fasta file (candidates.fa) and type
 fugueprf -prf mystr.fug -seq candidates.fa
This will be a slow operation and may cause some problems when some of the candidate sequences include domains not belonging to this family. The program will report a list of Z-scores and save the top hits in the file hit.seq. Either way, save the selected homologous sequences (unaligned) in a fasta file (homseq.fa).

3. Add the unaligned sequences to the structural profile

Type
 chorus -prf mystr.fug -seq homseq.fa -keeporder [ -plus ] -o homseq_mystr.ali
This will add the sequences in the file homseq.fa one-by-one to the original structure-based alignment. If the -plus options is specified, the combined structure-sequence profile is updated every time a new sequence is added. Without this option, each sequence is compared against the original structural profile.

4. Add the pre-aligned sequences to the structural profile

Alternatively, if the homologous sequences are similar to each other, you can align them first (using any sequence alignment program or by hand), save the alignment in aligned_homseq.aln and type
 fugueali -prf mystr.fug -seq aligned_homseq.alin -o homseq_mystr.ali

5. Update the profile

Now the original structure-only profile can be refined using the sequence-structure alignment. Type    melody -t mystr.tem -plus homseq_mystr.ali It will update the file mystr.fug. The whole process may be repeated.

What's the difference between FUGUESEQ and FUGUEALI?

fugueseq is a program for database searching and homology recognition; given a query sequence (or alignment), it scans a profile library and detects homologues.

fugueali is a program for producing sequence-structure alignments; given a sequence (or alignment) and a structural profile, it produces an optimal alignment between the two.

The homology recognition and the sequence-structure alignment are related but distinct operations and we have decided that they are best carried out by two separate programs. In fact, recognition and alignment are performed separately even within fugueseq. (see the note "Automatic selection of alignment algorithms".)

The main reason for using different alignment modes for recognition and alignment is that parameters/algorithms good for recognition may not be suitable for producing good alignment, and vice versa. To recognize the homology between sequence and structure, the matching of one or more short fragments may produce a significant signal, despite that other regions may be completely mis-aligned (fugue never uses local alignment algorithm). This is not a problem for homology detection.

However, when we produce alignment, we want to choose the parameters/algorithms to maximize the number of correctly aligned residue pairs and the optimal choice threre could be different. In general, what we observed during the past experiments is that when there is a significant difference in length between sequence and structure, the global-local (2 or 3) algorithm often produces better alignment than the global (0) algorithm. However, homology detection performance does not follow the same pattern.

Thus, by default, fugueseq uses different rules to decide which alignment mode to use for homology detection (z-score) and alignment.

fugueseq includes other features, not present in fugueali and thus fugueali may not produce the same alignment as fugueseq does. For example, fugueali keeps all the input sequences intact, while fugueseq, unless run with the '-keepseq' option, may remove some sequences according to PID, as well as some gap-rich columns in the input sequence alignment.

Another important difference is that fugueseq modifies gap penalties to emphasise the structural core, where structural conservations are more likely to be detected, while fugueali does not. Because of this, the overall alignment quality of fugueseq is usually not as good as that of fugueali, although the differences sometimes may be small.

In summary, the alignment produced by fugueseq (with default options) is a compromise between homology detection performance and alignment quality. In order to optimize the alignment when interesting homology is detected, fugueali should be used to re-do the alignment.


Automatic selection of alignment algorithms

Database searching (fugueseq)

  1. The default alignment method for fugueseq is global (0) but an alignment algorithm can be specified by the -A option:
    0 Global -- Modified Needleman & Wunsch
    1 Local -- Smith & Waterman
    2 GloLocSeq -- Global algorithm but hangout in the sequence NOT penalized
    3 GloLocPrf -- Global algorithm but hangout in the profile NOT penalized
    4 LocLoc -- Global algorithm but all hangout NOT penalized
    9 AUTOMATIC -- select method automatically
    The automatic selection uses the following definition:
    seq_len/prf_len >= 1.5 Method = 2 (GloLocSeq)
    < 0.6667 Method = 3 (GloLocPrf)
    Otherwise Method = 0 (Global)
    See the further note on the alignment algorithm below.
  2. fugueseq first makes an alignment based on the initial choice of alignment algorithm (see above) and then calculates a z-score (Zold).
  3. The program then selects an alignment algorithm automatically (see above). If the automatic selection would be identical to the initial choice, do nothing.
  4. Otherwise, the following operation is performed.

    4-1) If the initial choice is global (default)

    1. If Zold >= 8.0 or Zold <=2.0, do nothing.
      This is for computational efficiency only.
    2. For the interesting intermediate Z-score range, further examination is carried out:
      First re-calculate a z-score (Znew).
      If Znew >= 8.0 && seq_len > 15 && prf_len > 15
      or
      Znew >= 7.0 && seq_len > 250 && prf_len > 250
      or
      Znew < Zold
      then accept the new z-score. The first two conditions are based on our observation that global-local algorithms tend to produce more false positives than global does, particularly when the length of the query and/or profile is short. We therefore accept this choice only when the z-score is sufficiently high and both the query and profile lengths are long enough (these parameters were determined empirically). The third condition is to make the z-score estimation more conservative. If none of these conditions are satisfied, accept the original one (based on global).

    4-2) If the initial choice is a non-global algorithm

    1. If Zold >= 8.0 && seq_len > 15 && prf_len > 15
      or
      Zold >= 7.0 && seq_len > 250 && prf_len > 250
      or
      Zold < 2.0
      then accept this (reasoning behind this is similar to the one described above).
    2. Otherwise, re-calculate a z-score (Znew) using the automatically selected alignment algorithm.
    3. If Znew >= 8.0 or Znew< Zold accept the new one.
  5. fugueseq finally prints out alignments.

    This step is independent of the Z-score calculation. By default, the program uses an automatically selected algorithm for output alignments. However, if a particular algorithm is specified by the -A option, the automatic selection is disabled and the specified algorithm is used instead.

  6. The last two columns of the standard output from fugueseq indicate the final choices of the alignment algorithms: the one for the z-score calculation and the one for the alignment, respectively.

Alignment (fugueali)

As far as the automatic selection of alignment algorithms is concerned, fugueali behaves the same way as the alignment output part of fugueseq. (But there are some other differences between the two programs. For more details, see the section "What's the difference between fugueseq and fugueali?".) The default alignment method for fugueali is automatic (9) but an alignment algorithm can be specified by the -A option (see above).

Further note on the alignment algorithm

The gap penalties and z-score thresholds are optimized for algorithms 0, 2 and 3. Algorithms 1 and 4 were abandoned and should not be used because of the high false-positive rate. The options were left with fugue to allow expert users to experiment with their own gap penalties supplied via command-line options. Unless specified with the "-A" option, algorithms 1 and 4 will never be used by fugue. Note that some features, such as output in CASP format and building rough models, may not work properly with algorithm 1, since this algorithm may not produce full-length alignment.
Last update: 15 Apr 2015