■Introduction

Genomic sequence data is now available in several plant species. Although coding protein sequences have been annotated in such genomes based on cDNA sequence information, homologies with known proteins, and computational gene prediction, many of their biological and biochemical functions are not well predicted. We therefore hypothesize that evolutionarily conserved amino acid sequences (or motifs) in homologous proteins would reflect biochemical functions of the annotated proteins. Therefore, to provide hints to elucidate the function of annotated proteins, we have established a genome-wide database of similarity clustering based on distribution patterns of such motifs found in three datasets; the first set (ver. 1.0) is 83,508 proteome annotation sequences of three plant species: Arabidopsis thaliana, rice (Oryza sativa), and red alga (Cyanidioschyzon merolae), the second set (ver. 2.0) is 168,869 proteome annotation sequences of seven plant species: A. thaliana, rice, the red alga, sorghum, the moss, green alga (Ostreococcus tauri) and yeast (Saccharomyces cerevisiae), and the last one(ver. 3.0) is 250,687 proteome annotation sequences of ten plant species: A. thaliana, rice, the red alga, sorghum,the moss,the green alga, yeast, grape (Vitis vinifera), lycophyte (Selaginella moellendorffii), and another green alga (Chlamydomonas reinhardtii). The database is termed SALAD (Surveyed conserved motif ALignment diagram and the Associating Dendrogram) database.

■Purpose

Genome-wide comparative analysis of annotated protein sequences in plant.

■Features

What is motif?

■Current dataset

Version and species information

Details of ver. 3.0

Details of ver. 2.0

Details of ver. 1.0

Dataset (ver. 3.0)

Ref:LSDB(http://lifesciencedb.jp), JGI(http://www.jgi.doe.gov/)

・RAP-DB
(http://rapdb.dna.affrc.go.jp/)

・JGI
(http://genome.jgi-psf.org/)

・TAIR
(http://www.arabidopsis.org)

・Cyanidioschyzon merolae Genome Project
(http://merolae.biol.s.u-tokyo.ac.jp/)

・SDG Saccharomyces Genome Database
(http://www.yeastgenome.org/)

・Vitis project
(http://www.genoscope.cns.fr/externe/GenomeBrowser/Vitis/)

Dataset (ver. 2.0)

*In this SALAD database ver. 2.0, many monocot- specific motifs can be extracted as common motifs found in both two monocot species (rice and sorghum). Meanwhile, some dicot- specific motifs may not be extracted because ver. 2.0 data set contains only one dicot data (A. thaliana).

・RAP-DB
(http://rapdb.dna.affrc.go.jp/)

・JGI
(http://genome.jgi-psf.org/)

・TAIR
(http://www.arabidopsis.org)

・Cyanidioschyzon merolae Genome Project
(http://merolae.biol.s.u-tokyo.ac.jp/)

・SDG Saccharomyces Genome Database
(http://www.yeastgenome.org/)

Dataset (ver. 1.0)

・RAP-DB
(http://rapdb.dna.affrc.go.jp/)

・MIPS
(http://mips.gsf.de/proj/plant/jsf/athal/index.jsp)

・Cyanidioschyzon merolae Genome Project
(http://merolae.biol.s.u-tokyo.ac.jp/)

■Instructions and directions for use

First of all, You can select any annotation groups in this database by key word search, annotation ID, Pfam domain, and blast simlarity search.Then, you can go into a window of SALAD data. All of SALAD data presented in a window were created for each annotation group. In the diagram, a colored motif indicates the position of a consensus sequence pattern that occurs repeatedly in the selected annotation group. Therefore, discovered motifs may be distinct in different annotation groups if you see the same annoation in the groups. Thus, to avoid misunderstanding and get new more insights, we recommend you to refer to several annotation groups about the same target annotation of yours. In addition, SALAD database has function of intaractive SALAD analysis, please try it.

Our algorithm for clustring of annotation is based on presence and similarity scores of the extracted conserved motifs. For that reason we can't disscuss about phylogenetic evolution well. When you are intersted in phylogenetic relatiobships among the target annotations, please use a SALAD database function of making NJ phylogonetic trees based on alignment information from sequences of conserved motifs. You can select multiple motifs together to make the trees at once.

Empirically defined parameters were used for calculating the similarity scores, and any disturbed clusterings have been reported rarely.

What is SALAD?

Version and species information

Dataset (ver. 3.0)

Dataset (ver. 2.0)

Dataset (ver. 1.0)

SALAD Search & Analysis

Menu