Predicting DNA Recognition by C2H2 Zinc Finger Proteins by Support Vector Machine
Using the Online Scoring Form:
For an input zinc
finger protein, and an input DNA sequence, the online program will
locate the zinc fingers in your protein sequence and output the ten
top scoring DNA regions (i.e., those that are predicted to be the best
binding regions for the found zinc fingers). You can select the usage
of either the linear or polynomial pre-trained SVM model. Note that
the program will assume all fingers are binding consecutive bases. If
it is known that only a subset of the fingers in a protein bind, then
you may want to input just those fingers.
Sequences are input the one-letter amino acid and nucleotide
standard code. Any other non-standard symbols, spaces, or special characters
are ignored and will be not used for scoring. Please check your original
sequence in the output page. Please note that the protein may bind to
either the primary or complimentary DNA chain: this will be highlighted in the output window.
You may choose the "Calculate p-values" option to compute a p-value
for each score (i.e., the probability of obtaining the score by chance
only). The p-values are computed by generating 1000 sequences of the
same length as the binding region, and evaluting how many of these
would be scored as high the original score. To take into account the
length of the input DNA region, an E-value is approximated as
the p-value * (number of windows scored in the DNA sequence).
Choosing the p-value option can dramatically increase
calculation time, especially in case of using polynomial kernel (up to
several minutes). Please be patient. It is always a good idea to
start from the calculation without a p-value calculation, and check
whether the binding regions and scores are worth evaluating before going
to advanced options.
You may choose different background nucleotide probabilities
for generating randomized DNA sequences. By default,
the uniform 25% distribution for all four
nuclotides is used. Alternatively, you can specify any
distribution (e.g., the nucleotide distribution in the corresponding
genome) or choose the option which computes and uses the distribution in your input DNA
Pre-trained model files:
If you would like to test
our pre-trained SVM models using external programs, such as SVM_light, you can download
pre-trained model files for Linear and
Please consult the conversion table for amino acid - base
Experimental Database download:
We have also made available for download the database of experimental data collected from
25 individual manuscripts published in 1990 - 2005 and from the Protein Data Bank. This archive is password-protected. You can request the password by contacting us: email@example.com.
Each line in the database represents one experiment including fields:
source - data origin; dna - DNA sequence; zf - number of zinc fingers in protein; f1-fN - sequnces of corresponding zinc finger regions;
ex - type of example: + for binding, - for non-binding, Kd - for experimentally measured dissociation constant, and > for comparative examples when binding of sequence A is compared to the subsequently listed sequence B. Please consult the list of sources for all individual references.
If you use this program, please cite:
Anton Persikov, Robert Osada and Mona Singh (2008)
"Predicting DNA recognition by Cys2His2 zinc finger proteins".
Bioinformatics, 2009 Jan 1; 25(1): 22-29.
To give feedback or to send your comments or suggestions please email us: firstname.lastname@example.org
Return to the Front Page