Help for data analysis in WebArrayDB



Contents

1  Work Flow

2  Group Assignment

Currently WebArrayDB is dedicated to the differential analysis. Samples (arrays or specific channels) need to be assigned to different groups for the comparisons among groups. Genes differentially expressed among groups will be generated by the end of analysis.

Tips:

3  Cross-platform Probe Alignment

3.1  Match probes

One great feature of WebArrayDB is its capability of cross-platform probe alignment. Probes from different platforms can be aligned by any IDs which were pre-defined as reference ID, such as gene symbol, GenBank ID, RefSeq IDs. After alignment, a data matrix will be made in which each column is from a channel of an array and columns in each row have a same aligned probe/gene. Currently only probes presented in all platforms will be kept in the data matrix for further analysis.

There are six ways to match probes in WebArrayDB:
1) quick alignment,
2) match by “idx” or “unique_id”,
3) match by shared “Reference IDs”,
4) match by “user-specified columns”,
5) match by probe-mapping files,
6) “automatic” match.

3.1.1  Quick alignment

In the case that all involved platforms have the same number of probes, all probes are from the printing material sharing the same sequence, and printed in the same order, users can choose quick alignment option. WebArrayDB will actually skip the step of matching and use these platforms as a same one. This might help to save much time since regular alignment can be very time consuming.

3.1.2  Match by “idx” or “unique_id”

The “idx” column contains the printing order (or logical positions) of probes, so probes will be matched by printing order if “idx” is chosen. “unique_id” is the “id” or “unique_id” columns in the probe file.

3.1.3  Match by shared Reference IDs

In order to use “reference IDs” for alignment, users must provide reference IDs in probe files when defined platforms. The steps are: Probes are considered “identical” if they share the same reference IDs in WebArrayDB. When doing an analysis, users can choose to use one of these reference columns to align probes from different platforms, e.g., RefSeq IDs. That means probes in different platforms with the same RefSeq ID are considered as the same probe. In the eventual data matrix, these probes will be matched in one or more rows, depending on the method to match replicates.

“Identical” probes detected by reference IDs will be assigned a same “cross-platform ID” that is an integer unique in the database. Cross-platform IDs conform to the transitivity and evolution principles and will be used for “automatic” match.

3.1.4  Match by “user-specified columns”

The method is similar to that by shared “Reference IDs”. But the columns used for matching are not necessarily a reference column names. Theoretically any columns in the probe files can be used for matching. Especially users can select a different column for each involved platform to match probes as if these columns contain referenced IDs from one certain database. This method presents a very flexible way for matching.

3.1.5  Match by probe-mapping files

Steps to use probe-mapping files: “Identical” probes detected by probe-mapping files will be assigned a same “mapping IDs” that is an integer unique in the database. Mapping IDs also conform to the transitivity and evolution principles as cross-platform IDs. They will be used for “automatic” matchas well.

3.1.6  “automatic” match

Users can also choose the “automatic” option, in which WebArrayDB will use existing alignments by probe-mapping files or align probes by all available reference columns including gene_symbol, unique_id and other reference IDs, but the “idx” column in probe file won't be used.

WebArrayDB will try mapping IDs first, if no matching probes found, try cross-platform IDs again. If still no match, the column “idx” will be used.

3.2  Match replicates

When a probe has replicates in one or more involved platforms, he probe alignment among different platforms can be complex due to its many-to-many relationship, including the cases that there are duplicate spots on the array. WebArrayDB has provided six options, "median", "mean", "log mean", "shortest", "longest" and "cartesian product", to deal with multiplex alignments. For example, if two platforms were aligned by RefSeq ID, and one gene is represented by two probes (A1 and A2) in platform A and represented by three probes (B1, B2 and B3) in platform B. When options "median", "mean" and "log mean" were chosen, the median, mean or log mean value of the probes for the same UniGene will be used to represent that gene. If the option "shortest" is chosen, there will be two matches: A1 vs B1, and A2 vs B2. The option "longest" will make one more match in addition to those done for "shortest": A3 vs B3, where "A3" is the log mean value of A1 and A2. For the option "cartesian product", each probes for the same UniGene in Platform A will make a match to all the probes for the same RefSeq in Platform B, resulting in 6 (2 x 3) matches in total for this example.

4  Data Normalization

Normalization is a to minimize systemic noise before implementing differential analysis and is strongly suggested. Users are encouraged to read details about each normalization options before deciding which one to use. In general, there are four steps of normalization.
  1. Background correction
  2. Within-array normalization
  3. Between-array normalization (within a platform)
  4. Cross-platform normalization
The first three steps will be done before probe alignment. and the last step will be done after cross-platform probe alignment.

Background correction, within-array normalization and between-array normalization (within platform) are also parts of functions provided in WebArray.

Cross-platform normalization means data normailzation for arrays from different platforms. All between-array normalization methods are included for cross-platform normalization, furthermore, another three cross-platform methods were implemented in WebArrayDB as well: For the QD normalization, a parameter - “number of bin” has to be set. It is 2, 4, 8, ... a number of power of 2. Its default is 8.

For homologous platforms (e.g. different developmental versions of an user-spotted slides with a few probes changed), all
between-arrays normalization methods might be used. In such cases, between-array normalization within a platform is unnecessary.

5  Differential Analysis

5.1  Algorithms

Users have many options in algorithms for differential analysis in WebArrayDB: Student's t-test, eBayes-moderated t-test, SAM, ANOVA/ANCOVA and non-parametric tests.

5.2  Blocked or paired data

In case that data cannot be treated as blocked/paired, WebArrayDB will omit this option and do a regular analysis based on intensity.

Blocked/paired data has different meaning according to the selected algorithms.

5.3  ANOVA model

ANOVA/ANCOVA can be used to investigate the effects of multiple factors/variables. Here variables diff from factors by the type of their values - the values of a variable are number (integer or float) while the type of factors is string even if they consist of digits. Using factors/variables defined by the user or found in the database, a user can define or help WebArrayDB to define the linear model for ANOVA/ANCOVA.

Note that the model will be a mixed-effect model in cast that any random-effect factor/variable are used. Mixed-effect model could be very time-costing in computation.

Factors/Variables falls into three categories:
The factor “group”
This is the basic factor WebArrayDB aims to investigate. When the user define two or more groups, a factor “group” is defined automatically.
Factors in the database
Information stored in databases can be used as factors. Typically, these are platform, sample, dye, array, individual (sample individual).
User-defined factors/variables
There is a table to allow users to define factors/variables if “ANOVA” is chosen as the algorithm.
Based on experiment designs, there are four options to build a model for ANOVA/ANCOVA in WebArrayDB.

5.3.1  Use “group” only

In this case, “group” is considered as the only factor that take effects on intensity data. This option is good for simple experiment designs.

5.3.2  Try to use factors in database

WebArrayDB will attempt to use the “group” factor and as many as possible other factors in databases to build a model for ANOVA. Currently the following factors will be tried: When users select some of these factors, WebArrayDB will try to use them to build a model for ANOVA, any factor that are not suitable will be removed automatically.

5.3.3  User-defined factors/variables

WebArrayDB will use the “group” factor and all user-defined factors/variables to build the ANOVA model. This option presents a flexible way for analysis in case that the information in databases in insufficient for specific experiment designs.

5.3.4  User-defined model

There are several significant features/advantages in user-defined model:

5.4  Contrasts

This is to describe the comparisons users want to between “groups” . For example, if users put “group2 - group1”, it means the user wants to compare group2 and group1. In the analysis results, M will represent the log base 2 ratio of group2/group1. Multiple comparisons can be separated by “,” or “;”. In default if no information was filled in the Contrast box, all other groups will be compared to the first one, i.e. “group2 - group1; group3 - group1; group4 - group1” if there are total four groups.

Generally, a comparison is defined by group names separated by “+” and/or “-”. Don't include replicates of a group name within one comparison. But these limitations are removed if you use LIMMA based analysis, which allows more flexible comparisons made by pairs of parenthesis “()”, “/” and numbers, e.g., experienced users can try something like “(group4 - group3) - (group2 - group3)”, or “group3 - (group2 + group1)/2”.

6  Other analysis tools

Other analysis methods that has already been introduced into WebArrayDB: hierarchical clustering, heatmap, Correspondence Analysis, Between Group Analysis, Genome/CGH plotting.

All probes will be used if no (successful) differential analysis is done. Otherwise users can chose probes by p-values for these analysis.

References

[1]
Gary A Churchill. Fundamentals of experimental design for cdna microarrays. Nat Genet, 32 Suppl:490–495, Dec 2002.

[2]
H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6:393–423, 2002.

[3]
G. K. Smyth. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3:Iss. 1, Article 3, 2004.

[4]
G. K. Smyth, J. Michaud, and H. Scott. The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics, 21(9):2067–2075, 2005.

[5]
Jrn Tdling Rainer Spang. Assessment of five microarray experiments on gene expression profiling of breast cancer. 2003.

[6]
V. G. Tusher, R. Tibshirani, and G. Chu. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A, 98(9):5116–5121, Apr 2001.



This document was translated from LATEX by HEVEA.