ChIP-chip Data Guide
Chromatin immunoprecipitation (ChIP) is a powerful tool for analyzing DNA/protein interactions and histone modification within a chromosomal context. It is now possible to analyze transcriptional regulation on a genome-wide scale by coupling ChIP with microarrays (ChIP-chip).
Signal Intensity (Raw) Data
Signal intensity data is extracted from the scanned images of each array using NimbleScan, NimbleGen’s data extraction software. Signal intensities for each probe are saved in pair files (.txt), the raw data format for ChIP-chip experiments.
Scaled Log2-Ratio Data
Each feature on the array has a corresponding scaled log2-ratio. This is the ratio of the input signals for the experimental and test samples that were co-hybridized to the array. The log2-ratio is computed and scaled to center the ratio data around zero. Scaling is performed by subtracting the bi-weight mean for the log2-ratio values for all features on the array from each log2-ratio value. View log2-ratio data files (.gff) using SignalMap.
Peak Data
Using NimbleScan, peak data files (.gff) are generated from the scaled log2-ratio data. NimbleScan detects peaks by searching for 4 or more probes whose signals are above the specified cutoff values, ranging from 90% to 15%, using a 500bp sliding window. The cutoff values are a percentage of a hypothetical maximum, which is the mean + 6[standard deviation]. The ratio data is then randomized 20 times to evaluate the probability of “false positives.” Each peak is then assigned a false discovery rate (FDR) score based on the randomization. In general, use these guidelines when reviewing FDR scores:
- The lower the FDR score, the more likely the peak corresponds to a protein binding site.
- For most data sets, peaks with FDR score ≤ 0.05 very often represent the highest-confidence protein binding site(s).
- Peaks with FDR score between 0.05 and 0.2 are also indicative of a binding site.
- Peaks with FDR score > 0.2 are generally not considered high-confidence binding sites.
Viewing Peak Data Graphically
Open peak data files (.gff) in SignalMap. The peaks are color-coded and separated into 4 tiers for quick identification.
- Red: 1st-tier peaks (highest probability of a peak); FDR score ≤ 0.05
- Orange: 2nd-tier peaks; FDR score ≤ 0.1
- Yellow: 3rd-tier peaks; ≤ 0.1 FDR score ≤ 0.2
- Grey: 4th-tier peaks (lowest probability of a peak); FDR score > 0.2
Position the mouse pointer over each peak to display additional information (see table below).
| Field | Description |
| Score | The peak score, which is the log2-ratio of the 4th-highest probe in the peak. |
| Pos | Genomic coordinates of the peak. |
| Attr | Attributes specific for the peak. |
| Color | The peak color. |
| Cutoff_p | The percentage cutoff value (varies from 90% to 15%) when this peak is detected. |
| Cumul_peaks | Cumulative number of peaks up to that value of FDR. |
| Fdr | The FDR score. |
| Attr_2, attr_1, attr_0, etc. | Additional information, such as the settings of the peak finding algorithm. |
Viewing Peak Data in a Table Format
You can also open peak data files (.gff) in a spreadsheet program, such as Microsoft Excel, to view data in a table format. The peaks are sorted by FDR score with the most significant peaks listed first. See the table above for a detailed description of peak data.
Promoter Reports
For each annotated gene, NimbleScan searches for peaks that appear in a specified promoter region around the transcription start site (TSS). The region searched is design-specific; for most mammalian designs, the search region spans from 5kb upstream to 1kb downstream of the TSS.
You can view the promoter reports using spreadsheet software, such as Microsoft Excel:
- Report_All_Peaks – Lists all peaks with an FDR ≤ 0.2 and maps them to promoter regions. Each row in the report lists a peak-transcript pair. For each transcript, if more than one peak lies within the promoter region, there will be multiple rows for that transcript.
- Report_Nearest_Peaks – Lists all peaks with an FDR ≤ 0.2 and maps them to promoter regions. Each row in the report lists a peak-transcript pair. For each transcript, if more than one peak lies within the promoter region, only the peak nearest to the TSS is reported.
To effectively analyze peak data, you should sort the data in promoter reports according to FDR, peak score, gene name, chromosome, distance to TSS, etc. To sort data in Microsoft Excel, highlight row 1 and choose Data -> Filter -> Auto Filter. You can then sort individual columns by ascending/descending values, top 10 values, or individual values.
The table below identifies the fields on the promoter reports (.xls):
| Field | Description |
| PEAK_ID | An ID for each peak. |
| CHROMOSOME | Chromosome associated with the peak. |
| PEAK_START | First base of the peak on the chromosome. |
| PEAK_END | Last base of the peak on the chromosome. |
| PEAK_SCORE | The log2-ratio of the fourth highest probe in the peak. |
| PEAK_FDR | FDR value of the peak. |
| FEATURE_TRACK | The annotation track against which peaks were mapped; it is the transcription start site for promoter reports. |
| FEATURE_STRAND | Strand of the transcript. |
| FEATURE_START | First base of the feature on the chromosome. |
| Note: For the transcription start site, feature size is 1; therefore, start and end positions are the same. | |
| FEATURE_END | Last base of the feature on the chromosome. |
| Note: For the transcription start site, feature size is 1; therefore, start and end positions are the same. | |
| FEATURE_TO_PEAK_DISTANCE | Center-to-center distance of peak to feature. |
| Name | Gene symbol of the transcript. |
| accession | GenBank accession number of the transcript. |
| description | Full gene name of the transcript. |
| ncbi_gene_id | NCBI Entrez GeneID of the transcript. |
| synonyms | Other alias symbol(s) of the transcript. |
| Parent | The internal identification number of the transcript from which this transcription start site is generated. |
Custom Designs
If your array design is customized, some of the files described above may not be provided. For instance, annotation files (.gff) may not be readily available for less common genomes, which will result in no promoter reports being generated. In addition, the gene description file (.ngd) is available only for certain designs, since these files were replaced by annotation files (.gff) in newer designs. Also, if a positions file (.pos) is not available (because genomic coordinates were not provided for a custom design), no ratio files (.gff), peak data files (.gff), or promoter reports (.xls) are generated.
3rd Party Software Options
There are many third party packages into which one can import and analyze NimbleGen ChIP-chip data. Five 3rd party packages are listed below:
- M-peak (Nature 2005, 436:876-880)
http://www.stat.ucla.edu/~zmdl/mpeak/ - TAMALPAIS Server (Genome Res. 2006, 16:595)
http://chipanalysis.genomecenter.ucdavis.edu/cgi-bin/tamalpais.cgi - ACME (in R) (Methods Enzymol. 2006, 411:270-282)
http://bioconductor.org/packages/2.1/bioc/html/ACME.html - ChIPOTle (Genome Biol. 2005, 6:R97)
http://www.bio.unc.edu/faculty/lieb/labpages/ChIPOTle/home.htm
https://sourceforge.net/projects/chipotle-perl/ - Model Based Analysis of 2 color Arrays (MA2C)
http://liulab.dfci.harvard.edu/MA2C/MA2C.htm
The identification of motifs and sequences for qPCR design from your ChIP-chip data can now be easily performed by using the Cis-regulatory Element Annotation System (CEAS). This site accepts peaks GFF files for current builds of human (hg18) and mouse (mm8). Click here to download a guide to using the CEAS website.
Elucidating the function and transcriptional network of large gene lists can often be cumbersome and difficult to understand. Using the Database for Annotation, Visualization and Integrated Discovery (DAVID), you can functionally annotate your ChIP-chip data using a list(s) of genes that had their promoter regions bound by the factor of interest. Click here to download a guide to using the DAVID website.
Literature
Brochures & Datasheets
- NEW! NimbleGen ChIP-chip & DNA Methylation Microarrays
Brochure (PDF Format 6MB) - NimbleGen ChIP-chip 4x72K Array Delivery
Datasheet (PDF Format 256KB) - NimbleGen ChIP-chip 2.1M Whole-Genome Tiling and Deluxe Promoter Array Delivery
Datasheet (PDF Format 419KB)
User Guides
- NimbleGen Arrays User’s Guide: ChIP-chip Analysis (Version 6.2)*
User’s Guide (PDF Format 1.5MB) - * DO NOT reference this User's Guide if you are performing sample labeling using the discontinued version of the Dual-Color DNA Labeling Kit (Catalog Number: 05223547001) . The labeling protocols in this User Guide WILL NOT WORK with the discontinued kit. To access the labeling protocols for discontinued DNA Labeling Kits go to the DNA Labeling Kits Product Notice page and download the previous version of the User Guide.
- Guide to Identifying Motifs from Your ChIP-chip Data
CEAS User’s Guide (PDF Format 108KB)
Find out more about CEAS... - Guide to Functionally Annotating Your ChIP-chip Data
DAVID User’s Guide (PDF Format 274KB)
Find our more about DAVID...
Downloads
- 385K HG18 Tiled Promoters
List of HG18 genes whose promoters are tiled on 385K Promoter Arrays (Excel Format 13.2MB) - 385K MM8 Tiled Promoters
List of MM8 genes whose promoters are tiled on 385K Promoter Arrays (Excel Format 11.3MB)
Application Notes & Whitepapers
- Frequent switching of Polycomb repressive marks and DNA hypermethylation in the PC3 prostate cancer cell line
Application Note (PDF Format 914KB) - Using Chromatin Immunoprecipitations (ChIP) to Determine Protein Binding Sites on DNA
Whitepaper (PDF Format 436KB)
