Input File Format for compareExp.cgi

Input File Format:

The input file should be a tab-delimited text file, containing only two columns.

The first line is for indicating what the columns are. It can also be used as comment line.

The first column is the indicators of genes (Gene ID). The second column is the expression ("log ratio") value.

Support multiple gene ids for Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens, most of the common gene IDs can be used including Ensemble Gene ID, EMBL, UniGene, HUGO ids. Try your files first to see if they work, let us know if your gene id is not working.

An example of input file from Caenorhabditis elegans. (Download)

Name    EXP
AC3.5   3.631
AC3.8   2.325
AH6.6   .695
B0001.3 -.323
B0024.1 1.641
B0024.4 1.475
C06H5.1 3.09
......

An example of input file from Drosophila melanogaster.(Download)

Name    EXP
FBgn0026761     .204
FBgn0020497     .844
FBgn0010222     -1.08
FBgn0001404     .484
FBgn0036365     -.347
FBgn0024923     .655
FBgn0024891     -1.813
......

An example of input file from Saccharomyces cerevisiae.(Download)

Name    EXP
YAL001C .07
YAL008W .083
YAL010C -.046
YAL012W .004
YAL026C .179
YAL028AW        .269
YAL029C .42
....

An example of input file from Homo sapiens. (Download)

Besides the original support for using the NCBI GeneID(EntrezGene) as the column for gene names, five other kines of accession for human gene were also support now, including Ensembl Gene Accession, EMBL, UniGene, HUGO and Referece Sequence. Additional sample files for new input formats: Ensembl Gene Accession, EMBL, UniGene, HUGO and Reference Sequence.

Name(NCBI GeneID)    EXP
3689    1.728
3485    .79
4528    .399
2627    1.919
7869    1.021
54453   1.978
......

Data File for Debug

These debug files are not real expression data. They were generated with the above sample files, for the testing of the CGI. The first column(gene indicators) was replaced with the corresponding ortholog gene name of another organism. Duplicated orthologues were removed. An anti-correlated data file were also generated, by multiplying -1 to each expression value (With awk, simply by applying "awk -F "\t" '{print $1"\t"(-$2)}'").

For each of these files, the comparison result (Pearson correlation) to the original expression data will be 1 or -1.

File for debug: DM_CE_12566_median_DEBUG_1.data; Original SMD SUID (type): 12566 (median). Choose the organism : Drosophila melanogaster.
Anti-Correlated file for debug: DM_CE_12566_median_DEBUG_2.data; Original SMD SUID (type): 12566 (median). Choose the organism: Drosophila melanogaster.
File for debug: CE_SC_103_mean_DEBUG_1.data; Original SMD SUID (type): 103 (mean). Choose the organism : Caenorhabditis elegans.
Anti-Correlated file for debug: CE_SC_103_mean_DEBUG_2.data; Original SMD SUID (type): 103 (mean). Choose the organism: Caenorhabditis elegans.
File for debug: DM_HS_15772_debug_median.data.new; Original SMD SUID (type): 15772 (mean). Choose the organism: Drosophila melanogaster, and use the NCBI HomoloGene as the orthologous DB