The annotation package for the NimbleGen Sequence Capture 2.1M Human Exome Array includes 6 files that provide visualization and annotation of the array design. Download the complete 6-file set!
Note: The GFF files can be opened using SignalMap software from NimbleGen. The BED file can be displayed within the UCSC genome browser as a custom annotation track. XLS files can be opened by Microsoft Excel software.
- 2.1M_Human_Exome.gff = There are two tracks in this .gff file. The primary_target_region track displays the exon targets, and the capture_target track displays exon targets that are actually covered by the probes. You will notice that sometimes capture target and primary target regions do not perfectly align, meaning that 1) the exon is shorter than 200bp and the target region was extended out to at least 200bp, or 2) no probes were designed to that particular region of the exon target due to repetitive sequence. Note that this is the same design file as 080904_ccds_exome_rebalfocus_HX1.gff that is part of the standard deliverable for a 2.1M Human Exome array.
- 2.1M_Human_Exome.bed = This file contains the same information as the 2.1M_Human_Exome.gff file above, but in BED format, and is to be displayed within the UCSC Genome Browser. Note that this is the same design file as 080904_ccds_exome_rebalfocus_HX1.gff that is part of the standard deliverable for a 2.1M Human Exome array.
- 2.1M_coding_exons_annotation.gff = There is a single track in this .gff file. Each vertical bar represents 1 human protein coding exon target. A gray bar means that there are no covered bases for that exon target. A black bar means that there is at least 1 base of coverage. When using SignalMap software, move the cursor over each exon to display the CCDS ID.
- 2.1M_miRNA_annotation.gff = There is a single track in this .gff file. Each vertical bar represents 1 human miRNA exon target. A gray bar means that there are no covered bases for that exon target. A black bar means that there is at least 1 base of coverage. When using SignalMap software, move the cursor over each miRNA to display the miRNA registry identifier.
- 2.1M_coding_exons_annotation.xls and 2.1M_miRNA_annotation.xls These are Excel spreadsheets that list the genes and miRNAs that were targeted by the array design. The columns are:
- CCDS ID = The identifier number for a particular human protein coding gene.
- miRNA REGISTRY = The miRBASE identifier for a particular human miRNA gene.
- GENE SYMBOL = The alphanumeric identifier for a particular human protein coding gene.
- DESCRIPTION = The full name for a particular human protein coding gene.
- REFSEQ = The RefSeq identifier for a particular human protein coding gene.
- UCSC GENE ID = The UCSC identifier for a particular human protein coding gene.
- ENSEMBL = The ENSEMBL identifier for a particular human protein coding gene.
- CHROMOSOME = Identifies on what chromosome a particular human protein coding gene or human miRNA gene resides.
- STRAND = On what strand (+ or -) from which a particular human protein coding gene or human miRNA gene is expressed.
- START = Coordinates where a particular human protein coding gene or human miRNA gene starts.
- END = Coordinates where a particular human protein coding gene or human miRNA gene ends.
- EXON COUNT = A raw count of how many exons comprise a particular human protein coding gene.
- ARRAY COVERAGE = Percentage of target bases from a particular human protein coding gene or human miRNA gene covered by probes designed on the NimbleGen Sequence Capture array.
- ARRAY COVERAGE W 100BP EXTENSION = Percentage of target bases from a particular human protein coding gene or human miRNA gene covered by probes designed on the NimbleGen Sequence Capture array, PLUS 100bp of padding on both sides of each probe. This is a better estimate of the final sequencing coverage, because ~ 100bp flanking sequences at both ends of each probe are typically captured and sequenced.
Download the complete 6-file set!