For an input zinc finger protein, and an input DNA sequence, the online program will locate the zinc fingers in your protein sequence and output the ten top scoring DNA regions (i.e., those that are predicted to be the best binding regions for the found zinc fingers). You can select the usage of either the linear or polynomial pre-trained SVM model. Note that the program will assume all fingers are binding consecutive bases. If it is known that only a subset of the fingers in a protein bind, then you may want to input just those fingers.

Sequences are input the one-letter amino acid and nucleotide standard code. Any other non-standard symbols, spaces, or special characters are ignored and will be not used for scoring. Please check your original sequence in the output page. Please note that the protein may bind to either the primary or complimentary DNA chain: this will be highlighted in the output window.

You may choose the "Calculate p-values" option to compute a p-value for each score (i.e., the probability of obtaining the score by chance only). The p-values are computed by generating 1000 sequences of the same length as the binding region, and evaluting how many of these would be scored as high the original score. To take into account the length of the input DNA region, an E-value is approximated as the p-value * (number of windows scored in the DNA sequence).

Choosing the p-value option can dramatically increase calculation time, especially in case of using polynomial kernel (up to several minutes). Please be patient. It is always a good idea to start from the calculation without a p-value calculation, and check whether the binding regions and scores are worth evaluating before going to advanced options.

You may choose different background nucleotide probabilities for generating randomized DNA sequences. By default, the uniform 25% distribution for all four nuclotides is used. Alternatively, you can specify any distribution (e.g., the nucleotide distribution in the corresponding genome) or choose the option which computes and uses the distribution in your input DNA sequence.

If you would like to test our pre-trained SVM models using external programs, such as SVM_light, you can download pre-trained model files for Linear and Polynomial SVMs.

Please consult the conversion table for amino acid - base interacting pairs.

We have also made available for download the database of experimental data collected from 25 individual manuscripts published in 1990 - 2005 and from the Protein Data Bank. This archive is password-protected. You can request the password by contacting us: persikov@princeton.edu. Each line in the database represents one experiment including fields:

Anton Persikov, Robert Osada and Mona Singh (2008) "Predicting DNA recognition by Cys2His2 zinc finger proteins". Bioinformatics, 2009 Jan 1; 25(1): 22-29.

To give feedback or to send your comments or suggestions please email us: persikov@princeton.edu

Return to the Front Page