NimbleGen Masthead
 
spacer Product Images

DNA Methylation Microarrays and Services

Signal Intensity (Raw) Data

Signal intensity data is extracted from the scanned images of each array using NimbleScan™, NimbleGen’s data extraction software. Signal intensities for each probe are saved in pair files (.txt), the raw data format for DNA Methylation experiments.

Scaled Log2-Ratio Data

Each feature on the array has a corresponding scaled log2-ratio. This is the ratio of the input signals for the experimental and test samples that were co-hybridized to the array. The log2-ratio is computed and scaled to center the ratio data around zero. Scaling is performed by subtracting the bi-weight mean for the log2-ratio values for all features on the array from each log2-ratio value. View log2-ratio data files (.gff) using SignalMap.

P-Value Data

From the scaled log2-ratio data, a fixed-length window (750bp) is placed around each consecutive probe and the one-sided Kolmogorov-Smirnov (KS) test is applied to determine whether the probes are drawn from a significantly more positive distribution of intensity log-ratios than those in the rest of the array. The resulting score for each probe is the -log10 p-value from the windowed KS test around that probe. View p-value data files (.gff) using SignalMap.

Peak Data

Using NimbleScan, peak data files (.gff) are generated from the p-value data files. NimbleScan detects peaks by searching for at least 2 probes above a p-value minimum cutoff (-log10) of 2. Peaks within 500bp of each other are merged. View peak data files graphically or in table format using SignalMap or a spreadsheet software, respectively.

Viewing Peak Data Graphically

Position the mouse pointer over each peak to display additional information (see table below):

Field Description
Score The peak score, which is the average -log10 p-values from probes within that peak.
Pos Genomic coordinates of the peak.

Viewing Peak Data in a Table Format

You can also open peak data files (.gff) in a spreadsheet program to view data in a table format. See the table above for a detailed description of peak data.

Summary Reports

For each annotated gene, NimbleScan searches for peaks that appear in a specified promoter region around the transcription start site (TSS). The region searched is design-specific; for most mammalian designs, the search region spans from 5kb upstream to 1kb downstream of the TSS.

You can view the summary reports using spreadsheet software, such as Microsoft Excel:

  • Report All Peaks – Lists all peaks and maps them to promoter regions. Each row in the report lists a peak-transcript pair. For each transcript, if more than one peak lies within the promoter region, there will be multiple rows for that transcript.
  • Report Nearest Peak – Lists all peaks and maps them to promoter regions. Each row in the report lists a peak-transcript pair. For each transcript, if more than one peak lies within the promoter region, only the peak nearest to the TSS is reported.

To effectively analyze peak data, you should sort the data in summary reports according to peak score, gene name, chromosome, distance to TSS, etc. To sort data in Microsoft Excel, highlight row 1 and select Data -> Filter -> Auto Filter. You can then sort individual columns by ascending/descending values, top 10 values, or individual values.

The table below identifies the fields on the summary reports (.xls):

Field Description
PEAK_ID An ID for each peak.
CHROMOSOME Chromosome associated with the peak.
PEAK_START First base of the peak on the chromosome.
PEAK_END Last base of the peak on the chromosome.
PEAK_SCORE The peak score, which is the average -log10 pvalues from probes within that peak.
FEATURE_TRACK The annotation track against which peaks were mapped; it is the transcription start site for summary reports.
FEATURE_STRAND Strand of the transcript.
FEATURE_START First base of the feature on the chromosome.
  Note: For the transcription start site, feature size is 1; therefore, start and end positions are the same.
FEATURE_END Last base of the feature on the chromosome.
  Note: For the transcription start site, feature size is 1; therefore, start and end positions are the same.
FEATURE_TO_PEAK_DISTANCE Center-to-center distance of peak to feature.
Name Gene symbol of the transcript.
Accession GenBank accession number of the transcript.
description Full gene name of the transcript.
ncbi_gene_id NCBI Entrez GeneID of the transcript.
synonyms Other alias symbol(s) of the transcript.
Parent The internal identification number of the transcript from which this transcription start site is generated.
PEAK_ATTR Attribute field from the peak GFF file.
Custom Designs

If your array design is customized, some of the files described above may not be provided. For instance, annotation files (.gff) may not be readily available for less common genomes, which will result in no promoter reports being generated. In addition, the gene description file (.ngd) is available only for certain designs, since these files were replaced by annotation files (.gff) in newer designs. Also, if a positions file (.pos) is not available (because genomic coordinates were not provided for a custom design), no ratio files (.gff), peak data files (.gff), or promoter reports (.xls) are generated.

3rd Party Software Options

There are many third party packages into which one can import and analyze NimbleGen ChIP-based DNA methylation data. Five 3rd party packages are listed below:

Elucidating the function and transcriptional network of large gene lists can often be cumbersome and difficult to understand. Using the Database for Annotation, Visualization and Integrated Discovery (DAVID), you can functionally annotate your DNA Methylation data using a list(s) of genes that are CpG methylated in their promoter regions. Click here to download a guide to using the DAVID website.

 

CONTACT US

LITERATURE

NEW RESEARCH

WEBINARS