Instructions for bZIP Coiled-Coil Scoring Form
Usage
Example:
If the form is left blank, clicking "submit" shows
sample output on a segment of the human C/EBPepsilon homodimer with
the default sequences and registers.
Required input:
Two aligned protein sequences and their assigned register.
The sequences should be a string of amino acid 1-letter abbreviations.
The registers should be a string of characters a through g. Note that
this program has only been tested for coiled coils with uninterrupted
register.
Sequence and register length:
- Sequences must contain between 28 and 60 amino acids.
- Registers may be entered as a single character, which specifies
the uninterrupted register assignment for the entire alignment. They
may also be entered as a string of heptad positions with length 28-60.
- If a single character is entered for the register, then the scored
alignment has length equal to that of the shorter sequence. Otherwise,
the alignment length equals the minimum length among the two
sequences and register string.
Additional fine print:
- Both sequences and registers can be terminated by a final asterisk * .
- Amino acids and registers can be input in upper or lower case.
- Extra whitespace is allowed, but nonsense characters are not.
- If the sequences or registers contain invalid characters, an error will
be generated.
Errors
The program will terminate with a message under the following situations:
- The sequences contain alphabet characters {B,J,O,U,X,Z}.
- The register contains characters other than {a-g}.
- Sequence or register strings are wrong length.
- The registers string is not a continuous heptad repeat.
However, it will not check if your input is a valid coiled-coil sequence!
To detect coiled-coil motifs in a sequence, please use the
PairCoil or
MultiCoil programs.
If all output scores are 0, there is an error in the input.
Interpreting the output
For error checking, the original input is displayed, as well as the
trucated sequences and registers used for scoring.
Scores are computed using four classification models:
- The base-optimized weights were proposed in our paper
(FKS). These weights are obtained by training a Support Vector Machine on
the base dataset described in FKS.
- Weights from FKS+human bZIPs were obtained similarly. In this case,
the training set includes both the base dataset and the human bZIP
interactions tested in FKS.
- Simple electrostatic weights count favorable and unfavorable
electrostatic interactions.
- Coupling energy weights apply the coupling energies that have
been measured for some residue interactions.
Higher scores correspond to more likely interactions.
Actual scores computed using the above weights are shown for the
sequence1+sequence2 interaction.
Scores for each sequence may vary according to its amino-acid composition;
therefore, for reference the
distribution of scores for sequence1, paired with each bZIP sequence
in a dataset of 8 genomes, is shown. Likewise for sequence2.
The percentile is computed as the number of pairings that score worse than
the sequence1+sequence2 score, divided by total number of pairings across all genomes.
Pairings are taken using either sequence1 or sequence2.
(Note that sequence1+sequence2 is scored using the given alignment, but they are
individually aligned optimally with all the other bZIPs and then scored following
the new alignment.)
Cross-genomic bZIP dataset
Sequence1 is aligned with bZIP coiled-coil sequences from
Genome |
Number of sequences |
Anopheles gambiae (mosquito) |
23 |
Arabidopsis thalania |
77 |
Caenorhabditis elegans (worm) |
25 |
Drosophila melanogaster (fly) |
29 |
Danio rerio (zebrafish) |
51 |
Fugu rubripes (fugufish) |
54 |
Homo sapiens (human) |
53 |
Saccharomyces cerevisiae (yeast) |
16 |
The mouse and rat genomes were omitted because of near identity to
the human genome.