CompMoby

CompMoby incorporates evolutionary conservation information with the existing MobyDick algorithm to detect over-represented motifs in the upstream or 3' UTR sequences of co-expressed genes.

For examples, download zip file containing all sample input files for upstream promoter or 3' UTR sequences for test run.
Explanation of the results of the upstream promoter example can be found here.
It is recommended that you RepeatMask all your input sequences before running CompMoby.
1. Please input an upstream or 3' UTR sequence file of the genes in the reference species in FASTA format. (Maximum number of sequences: 200)(sample input file). Reference Species File:
2. Please input an upstream or 3' UTR sequence file of the orthologous genes in the orthologous species in FASTA format (Maximum number of sequences: 200)(sample input file). Orthologous Species File:
3. Please input the text file containing aligned non-coding regions for your subset of genes from the alignment wrapper script (Alignment extraction script) (sample input file from alignment extraction script) or a FASTA format file containing pair-wise alignments from an algorithm of your choice(sample input file in FASTA format). Pairwise alignment of non-coding regions:
4. Please input a file containing a background set of upstream or 3' UTR sequences in the reference species in FASTA format for p-value calculation (sample input file). For files with greater than 1000 sequences, please gzip your file to reduce uploading time. (Maximum number of sequences: 500. Maximum length: 2000bp per sequence). Background Sequence File:
5. Type of sequence to analyze: Upstream promoter sequences
3' UTR
6. Length of upstream sequence (promoter option) or downstream sequence (3' UTR option) to analyze. For the 3' UTR option, the user can analyze the entire length of the given 3' UTR by entering 0 as the option.
7. Filter out motifs (e.g. short AT-rich repeats) with occurrences greater than (dependent on length of sequences; 200 used in references 1 and 2):
8. Clustering threshold parameter (0.55 used in references 1 and 2).
9. Bonferroni corrected -log 10 p-value cutoff of enriched motifs. -log 10
10. Please provide a current email address. A link to your results will be emailed to this address when the CompMoby analysis is completed. email address:

retype email:


References:

    1. Chaivorapol, C., Melton, C., Wei, G., Yeh, R., Ramalho-Santos, M., Blelloch, R., and Li, H. CompMoby: Comparative MobyDick for detection of cis-regulatory motifs. BMC Bioinformatics (2008).

    2. Grskovic, M.*, Chaivorapol, C.*, Gaspar-Maia, A.*, Li, H., and Ramalho-Santos, M. Systematic identification of cis-regulatory sequences active in mouse and human embryonic stem cells. PLoS Genetics (2007).

    3. Bussemaker, H.J., Li H., and Siggia, E.D. Regulatory element detection using a probabilistic segmentation model. ISMB. (2000).

    4. Bussemaker, H.J., Li, H., and Siggia, E.D. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. PNAS (2000).

(Last updated: November 11, 2008)