Help for data analysis in WebArrayDB

1 Work Flow
2 Group Assignment
3 Cross-platform Probe Alignment
- 3.1 Match probes
- 3.2 Match replicates
4 Data Normalization
5 Differential Analysis
6 Other analysis tools
7 Miscellaneous options
8 FAQs
- 8.1 How to deal with technical replicates?
  - 8.1.1 Using ANOVA
  - 8.1.2 Using LIMMA

1 Work Flow

1) Search data

each channel of an searched array will be presented as an item in the array lists in the format: “array_name - Channel_No. (Platform_name::database_name) ”

2) Define groups

select single channels to form groups depend on your experiment design.

3) Set options for analysis

Background correction
Within-array normalization
Between-array normalization (within a platform)
Align cross-platform data
Between-array normalization (across platforms)
Differential analysis

2 Group Assignment

Currently WebArrayDB is dedicated to the differential analysis. Samples (arrays or specific channels) need to be assigned to different groups for the comparisons among groups. Genes differentially expressed among groups will be generated by the end of analysis.

Tips:

eBayes-moderated t-test (implemented in LIMMA) may need a reference group in its design matrix. If there’s some groups not used in comparison, the first one of them might be used as reference, other wise WebArrayDB will use “group1” as reference
Usually users need to define more than one group. But ANOVA/ANCOVA can use user-defined factors or models for analysis, in such cases, the “group” factor might not be necessary, the user can defined only one group.

3 Cross-platform Probe Alignment

3.1 Match probes

One great feature of WebArrayDB is its capability of cross-platform probe alignment. Probes from different platforms can be aligned by any IDs which were pre-defined as reference ID, such as gene symbol, GenBank ID, RefSeq IDs. After alignment, a data matrix will be made in which each column is from a channel of an array and columns in each row have a same aligned probe/gene. Currently only probes presented in all platforms will be kept in the data matrix for further analysis.

There are six ways to match probes in WebArrayDB:
1) quick alignment,
2) match by “idx” or “unique_id”,
3) match by shared “Reference IDs”,
4) match by “user-specified columns”,
5) match by probe-mapping files,
6) “automatic” match.

3.1.1 Quick alignment

In the case that all involved platforms have the same number of probes, all probes are from the printing material sharing the same sequence, and printed in the same order, users can choose quick alignment option. WebArrayDB will actually skip the step of matching and use these platforms as a same one. This might help to save much time since regular alignment can be very time consuming.

3.1.2 Match by “idx” or “unique_id”

The “idx” column contains the printing order (or logical positions) of probes, so probes will be matched by printing order if “idx” is chosen. “unique_id” is the “id” or “unique_id” columns in the probe file.

3.1.3 Match by shared Reference IDs

In order to use “reference IDs” for alignment, users must provide reference IDs in probe files when defined platforms. The steps are:

first select/define one or more “reference column names” on the WebArrayDB’s cross-platform infrastructure interface (“Browse/Add reference column names”)
then use them as column names in probe files, and supply reference IDs in these columns. The reference IDs can be the gene symbol or gene IDs (i.e. GenBank IDs, RefSeq IDs and Unigene IDs) from public databases. Users can also define their own reference IDs for the purpose of alignments.
upload the probe files
create new platforms or update existing platforms with these probe files
on the interface for data analysis, select a reference column as method for probe matching.

Probes are considered “identical” if they share the same reference IDs in WebArrayDB. When doing an analysis, users can choose to use one of these reference columns to align probes from different platforms, e.g., RefSeq IDs. That means probes in different platforms with the same RefSeq ID are considered as the same probe. In the eventual data matrix, these probes will be matched in one or more rows, depending on the method to match replicates.

“Identical” probes detected by reference IDs will be assigned a same “cross-platform ID” that is an integer unique in the database. Cross-platform IDs conform to the transitivity and evolution principles and will be used for “automatic” match.

Transitivity: Cross-platform IDs are transitive, e.g., considering probes “A”, “B”, “C”, if A and B share a common reference ID “ref1”, A and B are identical, if meanwhile B and C share reference ID “ref2”, B and C are identical too, then A, B, C are all “identical” and will share a same cross-platform ID.
Evolution: Cross-platform IDs are evolving as well. e.g. considering probes “A”, “B”, “C”, “D”, assume that A and B share a same cross-platform ID “xpf1” due to a common reference ID “ref1”, C and D share cross-platform ID “xpf2” due to “ref2”. Now if we add a new platform in which a new probe “E” is associated with both reference IDs “ref1” and “ref2”, A, B, C, D and E will be considered identical according to the transitivity principle described above. Then E and all other probes (C and D) with cross-platform ID “xpf2” will be assinged the cross-platform ID “xpf1”.

3.1.4 Match by “user-specified columns”

The method is similar to that by shared “Reference IDs”. But the columns used for matching are not necessarily a reference column names. Theoretically any columns in the probe files can be used for matching. Especially users can select a different column for each involved platform to match probes as if these columns contain referenced IDs from one certain database. This method presents a very flexible way for matching.

3.1.5 Match by probe-mapping files

Steps to use probe-mapping files:

upload the probe-mapping file
map probes across platforms by the probe-mapping file on the WebArrayDB’s cross-platform infrastructure interface
on the interface for data analysis, select “probe-mapping file” as the probe-matching method and select the uploaded file in the following line.

“Identical” probes detected by probe-mapping files will be assigned a same “mapping IDs” that is an integer unique in the database. Mapping IDs also conform to the transitivity and evolution principles as cross-platform IDs. They will be used for “automatic” match as well.

3.1.6 “automatic” match

Users can also choose the “automatic” option, in which WebArrayDB will use existing alignments by probe-mapping files or align probes by all available reference columns including gene_symbol, unique_id and other reference IDs, but the “idx” column in probe file won’t be used.

WebArrayDB will try mapping IDs first, if no matching probes found, try cross-platform IDs again. If still no match, the column “idx” will be used.

3.2 Match replicates

When a probe has replicates in one or more involved platforms, he probe alignment among different platforms can be complex due to its many-to-many relationship, including the cases that there are duplicate spots on the array. WebArrayDB has provided six options, "median", "mean", "log mean", "shortest", "longest" and "cartesian product", to deal with multiplex alignments. For example, if two platforms were aligned by RefSeq ID, and one gene is represented by two probes (A1 and A2) in platform A and represented by three probes (B1, B2 and B3) in platform B. When options "median", "mean" and "log mean" were chosen, the median, mean or log mean value of the probes for the same UniGene will be used to represent that gene. If the option "shortest" is chosen, there will be two matches: A1 vs B1, and A2 vs B2. The option "longest" will make one more match in addition to those done for "shortest": A3 vs B3, where "A3" is the log mean value of A1 and A2. For the option "cartesian product", each probes for the same UniGene in Platform A will make a match to all the probes for the same RefSeq in Platform B, resulting in 6 (2 x 3) matches in total for this example.

4 Data Normalization

Normalization is a to minimize systemic noise before implementing differential analysis and is strongly suggested. Users are encouraged to read details about each normalization options before deciding which one to use. In general, there are four steps of normalization.

Background correction
Within-array normalization
Between-array normalization (within a platform)
Cross-platform normalization

The first three steps will be done before probe alignment. and the last step will be done after cross-platform probe alignment.

Background correction, within-array normalization and between-array normalization (within platform) are also parts of functions provided in WebArray.

Cross-platform normalization means data normailzation for arrays from different platforms. All between-array normalization methods are included for cross-platform normalization, furthermore, another three cross-platform methods were implemented in WebArrayDB as well:

Median Rank Scores (MRS) [9];
Quantile Discretization (QD) [4];
Gene Quantile (GQ) - a moderated version of MRS by an additional step to standardize the gene-wise median values of each set to the median values of the reference set.

For the QD normalization, a parameter - “number of bin” has to be set. It is 2, 4, 8, … a number of power of 2. Its default is 8.

For homologous platforms (e.g. different developmental versions of an user-spotted slides with a few probes changed), all between-arrays normalization methods might be used. In such cases, between-array normalization within a platform is unnecessary.

5 Differential Analysis

5.1 Algorithms

Users have many options in algorithms for differential analysis in WebArrayDB: Student’s t-test, eBayes-moderated t-test, SAM, ANOVA/ANCOVA and non-parametric tests.

t-test
Student’s t-test. One of most-widely used statistical method for differential analysis - I assume everybody knows it.
eBayes-moderated t-test(LIMMA)
t-test is moderated by empirical Bayes methods and implemented in LIMMA [7, 8]. For two-color array data, users have the options of analyzing the data based on either ratio or intensity. However, if the data or an experiment design are not suitable for a ratio-based analysis, for example, single color array data were included in the analysis, the statistical analysis will be done based on intensity.
SAM
SAM represents “Significance Analysis of Microarrays”. This algorithm was developed by Tusher [10].
ANOVA / ANCOVA (fixed-effect model and mixed-effect model)
Mixed-effect model ANOVA plays a very important role in microarray data analysis [1]. This method can deal with multiple factors. The model for ANOVA usually looks like
E = µ + G + P + A + D + S + ε (1)

in which E is the observed log-transformed intensity value, µ is the theoretic “real” log-transformed intensity value, G is the group factor, which leads to effects of interests, e.g. treatment effects, P, A, D and S represents effects of platform, array, dye and sample respectively, ε represents the Gaussian random error with 0 as expected value. Under different conditions, there can be more or less factors used in the model. On WebArrayDB, based on data offered by users, platform, array, dye and sample might be considered as factors in which array is considered as a random effect factor.
non-parametric tests
When using a non-parametric test, Friedman rank sum test, Kruskal-Wallis rank sum test or Wilcoxon rank sum test will be chosen automatically based on the data offered. Specifically, WebArrayDB performs a generalized Friedman rank sum test with replicated blocked data or, as special cases, a Kruskal-Wallis rank sum test on data following a one-way layout or a Wilcoxon rank sum test following a one-way layout with only two groups.

5.2 Blocked or paired data

In case that data cannot be treated as blocked/paired, WebArrayDB will omit this option and do a regular analysis based on intensity.

Blocked/paired data has different meaning according to the selected algorithms.

LIMMA
For two-color data, the default behavior of LIMMA is try to use ratio of two channels within array to do statistical analysis. This means data are paired by array. In any other cases, LIMMA will use intensity data for analysis even if you choose to use paired data.
t-test, SAM, ANOVA
For t-test, SAM test and ANOVA, data might be paired only when there are just two groups defined. In such cases, WebArrayDB try to produce a ratio between the two groups according to selected factor. If this succeeds, ratios will be used for analysis. Then t-test becomes paired t-test. ANOVA will produce a different model based on ratio as well.
non-parametric tests
When there are just 2 groups, the Wilcoxon rank sum test will be used. If the data between the two groups have intrinsic connections, e.g. from the same array, using same dye, from same sample individual, …we may treated them as paired data, then the test will be done in a paired way.
When there are more than 2 groups, and data among these groups have intrinsic connections, we can treat these data as blocked and Friedman rank sum test will be used, otherwise, it is a Kruskal-Wallis rank sum test.

5.3 ANOVA model

ANOVA/ANCOVA can be used to investigate the effects of multiple factors/variables. Here variables diff from factors by the type of their values - the values of a variable are number (integer or float) while the type of factors is string even if they consist of digits. Using factors/variables defined by the user or found in the database, a user can define or help WebArrayDB to define the linear model for ANOVA/ANCOVA.

Note that the model will be a mixed-effect model in cast that any random-effect factor/variable are used. Mixed-effect model could be very time-costing in computation.

Factors/Variables falls into three categories:

The factor “group”: This is the basic factor WebArrayDB aims to investigate. When the user define two or more groups, a factor “group” is defined automatically.
Factors in the database: Information stored in databases can be used as factors. Typically, these are platform, sample, dye, array, individual (sample individual).
User-defined factors/variables: There is a table to allow users to define factors/variables if “ANOVA” is chosen as the algorithm.

Based on experiment designs, there are four options to build a model for ANOVA/ANCOVA in WebArrayDB.

5.3.1 Use “group” only

In this case, “group” is considered as the only factor that take effects on intensity data. This option is good for simple experiment designs.

5.3.2 Try to use factors in database

WebArrayDB will attempt to use the “group” factor and as many as possible other factors in databases to build a model for ANOVA. Currently the following factors will be tried:

Fixed-effect factors: platform, sample, dye
Random-effect factors: array, (sample )individual

When users select some of these factors, WebArrayDB will try to use them to build a model for ANOVA, any factor that are not suitable will be removed automatically.

5.3.3 User-defined factors/variables

WebArrayDB will use the “group” factor and all user-defined factors/variables to build the ANOVA model. This option presents a flexible way for analysis in case that the information in databases in insufficient for specific experiment designs.

5.3.4 User-defined model

There are several significant features/advantages in user-defined model:

All factors/variables appeared above can be used.
The type of a factor is determined by the model rather than its definition, that means you may use a random-effect factor as a fixed-effect one or do it on the contrary.
Analysis can also be done with a proper user-defined model even if there is just one group.
Interactions between factors/variables can be studied.

5.4 Contrasts

This is to describe the comparisons users want to between “groups” . For example, if users put “group2 - group1”, it means the user wants to compare group2 and group1. In the analysis results, M will represent the log base 2 ratio of group2/group1. Multiple comparisons can be separated by “,” or “;”. In default if no information was filled in the Contrast box, all other groups will be compared to the first one, i.e. “group2 - group1; group3 - group1; group4 - group1” if there are total four groups.

Generally, a comparison is defined by group names separated by “+” and/or “-”. Don’t include replicates of a group name within one comparison. But these limitations are removed if you use LIMMA based analysis, which allows more flexible comparisons made by pairs of parenthesis “()”, “/” and numbers, e.g., experienced users can try something like “(group4 - group3) - (group2 - group3)”, or “group3 - (group2 + group1)/2”.

6 Other analysis tools

Some other analysis methods that has already been introduced into WebArrayDB, e.g., hierarchical clustering, heatmap, correspondence analysis, between group analysis, and genome or comparative genomic hybridization (CGH) plotting. These analyses can share a common option:

An option - filter (Define a filter to screen differentially expressed probes)
Based on the result of differential analysis, a filter can be defined to screen probes of interests. All probes will be used if no (successful) differential analysis is done. Otherwise users can choose probes by p-values for these analysis. This filter can be optionally used, but it might be necessary for some analyses that cannot handle too many probes, e.g., clustering.

6.1 Cluster data

This analysis produces a clustering chart. Depending on the users’ requirements, WebArrayDB can cluster groups or data channels. A successful clustering requires at least three groups or data channels.

6.2 Heatmap

This analysis produces a heat map with a two-dimensional clustering. The groups (or data channels) are clustered in the horizontal direction. This should be a cluster chart similar to the “Cluster” above. Probes are clustered on the vertical direction.

6.3 Correspondence Analysis (COA)

COA finds outliers of probes. It requires at least two groups. Please refer to [5, 6].

6.4 Between Group Analysis (BGA)

BGA finds outliers of probes. It requires at least three groups. Please refer to [2, 3].

6.5 Plot genome

This function is designed for plotting intensity values or ratios along with locations of probes on the genome. Generally, the plotting is based on two-channel data - the first two groups, in which “group1” is used as the green channel (G, or the input channel), and “group2” as the red channel (R, or the output channel).

6.5.1 Options

The main options are listed below.

Plot genome segment
Usually a genome is too big to in a single chart. A feasible way is to plot a series of charts with each one showing data only for a section of the genome. The section can be a
Weaker spots to remove
In a microarray hybridization experiment, the probes for genes without expression gain lower intensity values. If the user do not want these probes to be used in analysis, here is the chance to remove them by using a positive value for “Percent of probes to be removed due to weaker signal”.
Significant level
The significant level is a threshold value for the p values resulted from differential analysis (see section 5). The probes of significant p values will be marked with little squares on charts if the option “label probes with significant p values on charts” is chosen. This value is also used to determine transposons in transposon analysis (see section 6.6).
Replace pooled intensity value
Due to complicated reasons, the intensity values of adjacent probes may vary drastically. Mostly these variations arise from noises. Such noises can be effectively reduced by using the median (or mean) value of intensity values of adjacent probes. This option presents the user a chance to decide how many adjacent probes should be used. The value “1” means that the original value will be used.

6.5.2 Outputs

At each plotting unit, i.e., a genome segment (see “Plot genome segment” in section 6.5), three charts will be plotted (see Figure 1 and Table 1):

Ratios of intensity values. (Note that the ratios have been performed log base 2 transformation)
Intensity values on the positive strand
Intensity values on the negative strand

Figure 1: Genome charts

Table 1: Legends for genome plotting

Legend Annotation

• Spots indicate the log base 2 transformed ratios for each probe.
Colors are used to indicate probe orientations and significance of p values.

red • positive strand, with significant p values

pink • positive strand, without significant p values

blue • negative strand, with significant p values

light blue • negative strand, without significant p values

▵, ▿ Triangles indicate locations of genes. Upwards for start, downwards for end.
The name (or number) between a pair of triangles specifies a gene using “gene_symbol” or a part of it.

red ▵, ▿ positive strand

blue ▵, ▿ negative strand.

▫ Squares specify locations with significant p values from differential analysis.
Colors are used to indicate probe orientations and regulated direction: output (group2) - input (group1).

red ▫ positive strand, down regulated

pink ▫ positive strand, up regulated

blue ▫ negative strand, down regulated

light blue ▫ negative strand, up regulated

— Purple horizontal bars indicate adjacent probes associated with significant p values.

| Purple vertical bars indicate locations of potential transposons.

Curve Intensity values (after log base 2 transformation) of probes along the genome.
Colors are used to indicate probe orientations and regulated direction: output (group2) - input (group1).

brown curve the input data (group1)

green curve the output data (group2)

6.6 Transposon analysis

The purpose of “transposon analysis” is to identify the location of transposons on the genome. This analysis can be carried out only when the nucleic acids for hybridization were amplified by primers on transposons. Meanwhile the probe file for involved microarray platform must contain necessarry information for genome ploting.

6.6.1 Options

Transposon analysis is done on the basis of genome plotting (see section 6.5). While all options in section 6.5 are applicable for transposon analysis, two additional options are used as well:

Threshold of A value
Threshold for oligos to be considered as candidate transposon inserts, expressed as a proportion of oligos above the specified value. Default = 0.5 equivalent to oligos being above the 70th percentile of all olgio intensities.
A value slope
Ratio of adjacent oligo intensities used to calculate the peak oligo in each region. This parameter is the location of the transposon in that region. Default is 1.2, meaning that an adjacent oligo must have at least a 1.2 fold lower signal than the candidate transposon location.

6.6.2 Outputs

Transposon analysis outputs “genome plotting” (see Figure 1) with vertical bars indicating locations of transposons (see Table 1).

Five additional TAB-delimited files will be created too:

The first file, with the name ended with “transposon_table_all_probes.txt”, is the biggest file for transposon analysis, containing all probes (that appear in the result file from differential analysis)
The second file, with the name ended with “transposon_table.txt”, only contains the first two probes for each transposon. It is a subset of the first file.
The third file, with the name ended with “transposon_table_all_genes.txt”, contains the first occurrence of every gene_symbol in the second file. If a gene_symbol is not found in the second file, the first occurrence in the first file will be used instead. It contains all genes, no matter if there is transposon or not. This file is a subset of the first file.
The fourth file, with the name ended with “transposon_table_by_gene.txt”, contains the first occurrence of every gene_symbol in the second file. It is a subset of the latter. It is also a subset of the third file.
The fifth file, with the name featured by “transposon_table_p_X.XX.txt”, contains the rows with significant p values (from differential analysis). It is a subset of the second file.

6.7 Bacterium CGH analysis

6.7.1 Options

The users can set up three break points to make four intervals (spaces) for the ratios. Ratios in different spaces will be plotted in distinguished colors: RED, BLUE, GRAY, and GREEN.

6.7.2 Outputs

Bacterium CGH analysis outputs a little different “genome plotting”, only ratios are plotted (see Figure 2). The differences in legends are also listed in Table 2.

Figure 2: Genome charts

Table 2: Legends for bacterium CGH plotting

Legend Annotation

• Spots indicate the log base 2 transformed ratios for each probe.
Colors are used to indicate probe orientations and significance of p values.

red • ratios below the first (lowest) break point.

blue • ratios between the first two break points.

gray • ratios between the last two break points.

green • ratios above the last (highest) break point.

An additional TAB-delimited file with the name ended with “bacCGH_table.txt”, is produced with gene-wise (by “gene_symbol”) summary for number of probes, and intensity values (log base 2 transformed).

7 Miscellaneous options

Some global options can be defined in this section.

EPS chart format
Normaly WebArrayDB output charts in Portable Network Graphics (PNG) format, which is good for web browsers. However if the user wants charts of higher quality, this option will force WebArrayDB to produce charts in PostScript (PS) format, which will be finally converted to figures in EPS format and in PNG format. This may produce huge sizes of picture files and can be very slow for a first browsing. Select this only when you need high-quality pictures for publication!
Genomic information to use
By default probe information used in charts and output files are collected from the database. If more than one microarray platforms are invovled, probe information will be collected from the first platform used in group1. However, if the user wants to use different annotations, a probe file can be provided over here. The annotations in this file will be used to replace those from the database. But make sure that the grid information in this file should be same to that in database!
Probe annotation output
Usually output files will include columns “idx”, “block_row”, “block_col”, “row”, “col”, “unique_id”, “chromosome”, “probe_start”, “probe_end”, “probe_strand”, “gene_symbol”, “gene_title”, and “gene_strand” from the probe file. If the option is set a value other than “nowhere”, all other columns will be included either in separated files or in data/result tables.

8 FAQs

8.1 How to deal with technical replicates?

Here is an example for using technical replicates. Assume a project involved three biological replicates of two conditions (i.e. 6 samples in total) which were compared on arrays, two channels, condition 1 versus condition 2. They also dye-swapped every sample, so now we end up with six arrays with two channels each, each sample has two technical replicates:

samples for condition 1: A1, A1, A2, A2, A3, A3

samples for condition 2: B1, B1, B2, B2, B3, B3

We will explain how to use ANOVA and eBayes-moderated t-test (in LIMMA) to compare the difference of samples between the two conditions.

8.1.1 Using ANOVA

Create two groups with samples in order:

group1:	A1, A1, A2, A2, A3, A3
group2:	B1, B1, B2, B2, B3, B3

Choose the third option for ANOVA model - Use following user-defined factors as well as "Group", and define a factor with the following values:

Factor name: facsamp

Factor type: random

Data type: string

Factor values: A1, A1, A2, A2, A3, A3, B1, B1, B2, B2, B3, B3

8.1.2 Using LIMMA

Create six groups:

group1:	A1, A1
group2:	A2, A2
group3:	A3, A3
group4:	B1, B1
group5:	B2, B2
group6:	B3, B3

Comparisons to make is:
(group4 + group5 + group6 - group1 - group2 - group3)/3

References

[1]: Gary A Churchill. Fundamentals of experimental design for cdna microarrays. Nat Genet, 32 Suppl 2:490–495, Dec 2002.
[2]: Aedín C Culhane, Guy Perrière, Elizabeth C Considine, Thomas G Cotter, and Desmond G Higgins. Between-group analysis of microarray data. Bioinformatics, 18(12):1600–1608, Dec 2002.
[3]: Aedín C Culhane, Guy Perrière, and Desmond G Higgins. Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics, 4:59, Nov 2003.
[4]: H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6:393–423, 2002.
[5]: G. Perrière, J. R. Lobry, and J. Thioulouse. Correspondence discriminant analysis: a multivariate method for comparing classes of protein and nucleic acid sequences. Comput Appl Biosci, 12(6):519–524, Dec 1996.
[6]: Guy Perrière and Jean Thioulouse. Use of correspondence discriminant analysis to predict the subcellular location of bacterial proteins. Comput Methods Programs Biomed, 70(2):99–105, Feb 2003.
[7]: G. K. Smyth. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3:Iss. 1, Article 3, 2004.
[8]: G. K. Smyth, J. Michaud, and H. Scott. The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics, 21(9):2067–2075, 2005.
[9]: Jrn Tdling Rainer Spang. Assessment of five microarray experiments on gene expression profiling of breast cancer. 2003.
[10]: V. G. Tusher, R. Tibshirani, and G. Chu. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A, 98(9):5116–5121, Apr 2001.

This document was translated from L^AT_EX by H^EV^EA.


Legend	Annotation

•	Spots indicate the log base 2 transformed ratios for each probe. Colors are used to indicate probe orientations and significance of p values.
	red •	positive strand, with significant p values
	pink •	positive strand, without significant p values
	blue •	negative strand, with significant p values
	light blue •	negative strand, without significant p values

▵, ▿	Triangles indicate locations of genes. Upwards for start, downwards for end. The name (or number) between a pair of triangles specifies a gene using “gene_symbol” or a part of it.
	red ▵, ▿	positive strand
	blue ▵, ▿	negative strand.

▫	Squares specify locations with significant p values from differential analysis. Colors are used to indicate probe orientations and regulated direction: output (group2) - input (group1).
	red ▫	positive strand, down regulated
	pink ▫	positive strand, up regulated
	blue ▫	negative strand, down regulated
	light blue ▫	negative strand, up regulated

—	Purple horizontal bars indicate adjacent probes associated with significant p values.

\|	Purple vertical bars indicate locations of potential transposons.

Curve	Intensity values (after log base 2 transformation) of probes along the genome. Colors are used to indicate probe orientations and regulated direction: output (group2) - input (group1).
	brown curve	the input data (group1)
	green curve	the output data (group2)