Instructions for bZIP Coiled-Coil Scoring Form

Usage

Example: If the form is left blank, clicking "submit" shows sample output on a segment of the human C/EBPepsilon homodimer with the default sequences and registers.

Required input: Two aligned protein sequences and their assigned register. The sequences should be a string of amino acid 1-letter abbreviations. The registers should be a string of characters a through g. Note that this program has only been tested for coiled coils with uninterrupted register.

Sequence and register length:

Sequences must contain between 28 and 60 amino acids.
Registers may be entered as a single character, which specifies the uninterrupted register assignment for the entire alignment. They may also be entered as a string of heptad positions with length 28-60.
If a single character is entered for the register, then the scored alignment has length equal to that of the shorter sequence. Otherwise, the alignment length equals the minimum length among the two sequences and register string.

Additional fine print:

Both sequences and registers can be terminated by a final asterisk * .
Amino acids and registers can be input in upper or lower case.
Extra whitespace is allowed, but nonsense characters are not.
If the sequences or registers contain invalid characters, an error will be generated.

Errors

The program will terminate with a message under the following situations:

The sequences contain alphabet characters {B,J,O,U,X,Z}.
The register contains characters other than {a-g}.
Sequence or register strings are wrong length.
The registers string is not a continuous heptad repeat.

However, it will not check if your input is a valid coiled-coil sequence! To detect coiled-coil motifs in a sequence, please use the PairCoil or MultiCoil programs.

If all output scores are 0, there is an error in the input.

Interpreting the output

For error checking, the original input is displayed, as well as the trucated sequences and registers used for scoring.

Scores are computed using four classification models:

The base-optimized weights were proposed in our paper (FKS). These weights are obtained by training a Support Vector Machine on the base dataset described in FKS.
Weights from FKS+human bZIPs were obtained similarly. In this case, the training set includes both the base dataset and the human bZIP interactions tested in FKS.
Simple electrostatic weights count favorable and unfavorable electrostatic interactions.
Coupling energy weights apply the coupling energies that have been measured for some residue interactions.

Higher scores correspond to more likely interactions.

Actual scores computed using the above weights are shown for the sequence1+sequence2 interaction.

Scores for each sequence may vary according to its amino-acid composition; therefore, for reference the distribution of scores for sequence1, paired with each bZIP sequence in a dataset of 8 genomes, is shown. Likewise for sequence2.

The percentile is computed as the number of pairings that score worse than the sequence1+sequence2 score, divided by total number of pairings across all genomes. Pairings are taken using either sequence1 or sequence2. (Note that sequence1+sequence2 is scored using the given alignment, but they are individually aligned optimally with all the other bZIPs and then scored following the new alignment.)

Cross-genomic bZIP dataset

Sequence1 is aligned with bZIP coiled-coil sequences from

Genome Number of sequences

Anopheles gambiae (mosquito) 23

Arabidopsis thalania 77

Caenorhabditis elegans (worm) 25

Drosophila melanogaster (fly) 29

Danio rerio (zebrafish) 51

Fugu rubripes (fugufish) 54

Homo sapiens (human) 53

Saccharomyces cerevisiae (yeast) 16

Genome	Number of sequences
Anopheles gambiae (mosquito)	23
Arabidopsis thalania	77
Caenorhabditis elegans (worm)	25
Drosophila melanogaster (fly)	29
Danio rerio (zebrafish)	51
Fugu rubripes (fugufish)	54
Homo sapiens (human)	53
Saccharomyces cerevisiae (yeast)	16

The mouse and rat genomes were omitted because of near identity to the human genome.