BACK
 

Q1. What is the motif?
Q2. How to print output of SALAD or to use it in presentation software.
Q3. What is the MEME?
Q4. What is the WebLog?
Q5. What is the SALAD clustering?
Q6. What are the value of AU and BP?
Q7. How to manipulate SVG more conveniently?
Q8. What is the criterion for order of motif number in a annotation group?





Q1. What is the motif?
In this SALAD database, we define the motif as evolutionary conserved short amino acid sequences among intraspecies or interspecies. To find such motifs in annotated protein sequences, we extracted motifs using the MEME software (http://meme.sdsc.edu/meme/intro.html) with lengths ranging between 8 and 50. As dataset, we used non-redundant proteome annotation dataset from the entire genome data of the plant species.

There are several conserved domain database such as Pfam, in which the functions of extracted amino acid domains are focused on. We rather focused on the local distribution of motifs in annotations to evaluate similarity among annotations. To connect known domain information such as Pfam domain, we provide a diagram with the positional information of Pfam domains in each annotation corresponding to all SALAD annotation group in a different windows of SALAD database.

We have not assigned unified IDs to each motif yet, because we extracted motifs for each annotation groups in this present version of SALAD database. We will improve this in Ver. 2.
TOP




Q2. How to print output of SALAD or to use it in presentation software.
Capture a screen shot (A fast-food way)
<Windows>
1. To capture a screen shot of the entire screen to the clipboard, press the key of [PrintScreen].
   To capture a screen shot of the active window or dialog box (for example, the WordPad application window or the File Open dialog box), press [ALT] + [PrintScreen].

2. Paste captured the image to graphic software (for example, Paint or Photoshop etc.).
   (Paste: [Ctrl] + [v])

3. You shape the image using graphic software, and then you can copy them to any presentation software (for example, PowerPoint etc.).

<Mac>
1. To capture a screen shot of the entire screen, press [apple/command] + [shift] + [3].
   To capture a screen shot of the active window or dialog box, press [apple/command] + [shift] + [4].

2. MacOS8/MacOS9: Create a picture file in boot disk.
   MacOSX: Create a picture file named "pictureX.png" at desktop.

3. You shape the image using graphic software (for example, Preview or Photoshop etc.), and then you can copy them to any presentation software (for example, Keynote or PowerPoint etc.).


Using a SVG editor (for example, Inkscape or Illustrator etc.)
How to use Inkscape (Open Source)

1. Dowonload Inkscape software and install
   Go to download page and choose download package depending on your Operating System (OS).

2. Download the SVG data of SALAD
   To get the SVG data, there are several ways.
   a) Press SVG download button (Win + Mac)
   b) Click the left mouse button on the SVG data (Win)
   c) Click the mouse button on the SVG data + press [Control] (Mac)


3.Print the SVG data using Inkscape
   Please refer to Inkscape manual for operation instruction.


*With the method of capture a screen shot, you can print an image of the SVG data which contain manipulated motif position information on the web, but once downloaded, you may lose the information of your manipulation.
TOP




Q3. What is the MEME?
MEME (Multiple EM for Motif Elicitation) is a tool for discovering consensus sequences in a group of related DNA or protein sequences based on the expectation maximization method.

MEME can provide non-gapped consensus sequences as position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern. Individual MEME motifs do not contain gaps. Patterns with variable-length gaps are split into two or more separate sequences by MEME.

MEME (http://meme.sdsc.edu/meme/intro.html)

citations:
Timothy L. Bailey, Nadya Williams, Chris Misleh, and Wilfred W. Li, "MEME: discovering and analyzing DNA and protein sequence motifs", Nucleic Acids Research, Vol. 34, pp. W369-W373, 2006.

Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.
TOP



Q4.  What is the WebLogo?
WebLogo is a sequence logo generator.

Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment.

WebLogo(http://weblogo.berkeley.edu/)

citations:
Crooks GE, Hon G, Chandonia JM, Brenner SE WebLogo: A sequence logo generator, Genome Research, 14:1188-1190, (2004)

Schneider TD, Stephens RM. 1990. Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acids Res. 18:6097-6100
TOP



Q5.  What is the SALAD clustering?

Clustering is the process of collecting closely related objects. Distances defined to imply the relationship between objects are utilized to connect two objects in a dendrogram such as phylogenetic tree.

In this SALAD database, we provide not only motif diagrams but also clustering data for each annotation groups. To compare similarity of annotations with distinct patterns of several motifs, we developed an algorithm for clustering of annotations in a given annotation group using distance scores based on both presence and similarity (if present) of the extracted conserved motifs. With this algorithm, we could classify annotation groups into bootstrapped clades with distinct motif patterns in a dendrogram. All possible pairwise motif sequence similarities were scored by an amino acid substitution matrix and utilized to get the distance scores. Then, proteins with more similar motif patterns go into the same clades like below. Each node of the dendrogram has the value of AU or BP, these values indicate degree of confidence of the node in the dendrogram. If all annotations have the very similar motif patterns, we have verified that a dendrogram very similar to corresponding NJ trees was often obtained with our algorithm.



*The SALAD dendrogram is no more than arrangement of grouping of annotations. Moreover, it is necessary to interpret the classification of annotations by yourself, because this SALAD dendrogram is different from typical phylogenetic trees. (Please try one of SALAD functions that can create Neighboor-Joining tree by alignments of motif sequences.)

Clustering was calculated using pvclust in R software (http://www.r-project.org/).
pvclust(http://www.is.titech.ac.jp/~shimo/prog/pvclust/) was created by

Ryota Suzuki(a, b) and Hidetoshi Shimodaira(a)

a) Department of Mathematical and Computing Sciences, Tokyo Institute of Technology

b) Ef-prime, Inc.
TOP



Q6. What are the value of AU and BP?
AU = Approximately Unbiased p-value
BP = Bootstrap Probability value

Values of AU and BP are one of evaluation indexs.

Bootstrap Probability values (BP) are calculated as a percentage of the number of times that particular node appears when you perform bootstrapping analysis on a given dendrogram and distance score data.

Approximately Unbiased p-values, which are calculated by the multiscale bootstrap method, are more accurate because it corrects the bias of the bootstrap probability value caused by a constant sample size.

Multiscale bootstrap method

H. Shimodaira (2002). An approximately unbiased test of phylogenetic tree selection, Systematic Biology, 51, 492-508.

H. Shimodaira (2004). Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling, Annals of Statistics, 32, 2616-2641.
TOP



Q7. How to manipulate SVG more conveniently
The movement interval can be adjusted with the button at the center of the arrow key. You can manipulate easily when SVG data size is big. Please try it.

Moreover, you can manipulate SVG by mouse (only Internet Explore (Windows)).
To scroll the SVG image, hold [Alt] and drag with the mouse.

TOP



Q8. What is the criterion for order of motif number in a annotation group?
The motif significance is reported as the E-value of the motif upon the MEME analysis. The motif number in the SALAD data depends on this E-value. The statistical significance of a motif is computed based on:
1) the log likelihood ratio,
2) the width of the motif,
3) the number of occurrences,
4) the 0-order portion of the background model,
5) the size of the training set, and
6) the type of model (oops, zoops, or anr, which determines the number of possible different motifs of the given width and number of occurrences).

The threshold of E-value was decided empirically. Therefore you may discover almost motifs with the exception of specific motif which contain conserved particular amino acid like LRR.
TOP