Sequence Capture Human Exome 2.1M Array

2.1M Array IconFormat: 2.1M
Source: UCSC
Build: HG18 or HG19
Recommended Storage: Store arrays desiccated at room temperature.
Description # of Probes Capture Target Size Catalog Number Pack Size Workflow Ordering*
NimbleGen Sequence Capture Human Exome 2.1M Array 2.1M Up to 50Mb 05451957001 1 Slide Delivery Workflow Icon Buy Online Button
05547792001 5 Slides Delivery Workflow Icon
N/A Dataset Service Workflow Icon Service Options Button
* Availability of products varies from country to country.


Advantages

  • High Performance: Capture up to 50Mb total regions on a single 2.1M array and up to 5Mb on a single 385K array with high coverage and specificity.
  • Design Expertise: Ensure the highest level of specificity and sensitivity with an empirically tested and validated capture design algorithm.
  • Embedded Quality Controls: NimbleGen Sequence Capture arrays incorporate built-in control probes to ensure system performance.
  • Maximum Flexibility: Tailor the array design to capture your genomic regions or thousands of exons in parallel.
  • Substantial Savings: Save time and cost compared to PCR-based methods.

Applications

NimbleGen Sequence Capture arrays are suitable for targeted sequencing of any size, from small target regions like 250KB (Figure 1) to large regions as large as 30MB (Table 1). All human designs utilize the empirically optimized Sequence Capture design algorithm to ensure highly uniform capture. For example, a 250KB contiguous region, representing a typical GWAS locus, is captured with high specificity and uniformity (Figure 1). Note that small repetitive regions where no probes were selected can still be covered by sequencing, due to efficient capture from neighboring probes and with the advantage of long reads from the Genome Sequencer FLX Titanium Series (red boxes in Figure 1).

Table 1

Capture of Large Contiguous Regions using 2.1M Arrays
Experiment A B
Total Reads (millions) 1.2 1.3
Total Bases 347 Mb 380 Mb
% Reads Mapped Uniquely 87.6 86.7
% Bases Mapped Uniquely 93.1 92.6
% Mapped Reads on Target 79.1 70.8
Average/Median Coverage 10.3/9 10.1/8

Table 1. The ENCODE pilot regions (~30Mb) are captured using 2.1M arrays and sequenced. The target regions consist of ~ 50 individual contigs of ~ 500kb each


Figure 1

Figure 1

Figure 1. High-Performance Targeted Resequencing in a 250kb Target Region.


Roche offers a seamless workflow combining NimbleGen Sequence Capture Arrays and the high throughput sequencing of the Genome Sequencer FLX System from 454 Life Sciences. This complete solution of kits, arrays and instruments are specifically designed to optimize the workflow, reduce processing time, minimize costs, and enhance data quality. Furthermore, the GS Reference Mapper software from 454 Life Sciences enables researchers to easily identify variants like SNPs and indels from the final data output without complicated bioinformatics infrastructure (Table 2)

Table 2

454 Optimized Sequence Capture: Resequencing of HapMap Research Sample
Experiment 250 kb - 1 1 Mb - 1
Total Reads 70,190 140,374
Total Bases 27,646,394 55,453,593
On-Target Reads 75.2% 87.3%
Median Coverage 85 49
Target Bases with 1+ Coverage 98.6% 96.9%
Target Bases with 10+ Coverage 97.3% 92.8%
Known SNP Detection Rate 97.4% 96.5%

Table 2. Sequence Capture Performance on a 250 kb contiguous region and a 1 Mb contiguous region in the human genome. Data shown are from 1 of the 4 independent experiments for each region. A HapMap sample is used in the study and SNP calls were generated by the GS Reference Mapper software.


An example of discovering causative mutations, the mouse Kit locus (~200KB) from 5 non-complementing Kit mutants is shown in Figure 2. These alleles include one known allele W-41J, and four unknown alleles, W-20J, W-39J, W-40J and W-73J. The known mutation from W-41J was confirmed in this experiment, and the data analysis successfully identified a non-synonymous coding mutation for each of the 4 unknown alleles. (D’Ascenzo et al, Mamm. Genome, 2009, 20:424–436)

Figure 2

Figure 2

Figure 2. Mutation discovery in the mouse KIT Locus using Sequence Capture and 454 Sequencing.

Protocol

Roche NimbleGen offers two types of capture methods: SeqCap EZ Library, a solution-based method and Sequence Capture Arrays, an array-based capture method.

SeqCap EZ Library and Sequence Capture Array Protocol

Sequence Capture Protocols

  1. Genomic DNA: SeqCap EZ Oligo pool or an array is made against target regions in the genome.
  2. Library Preparation: Standard shot-gun sequencing library is made from genomic DNA.
  3. Hybridization: The sequencing library is hybridized to the SeqCap EZ Oligo pool or to the Sequence Capture array.

Steps 4 and 5 are different for each protocol:

SeqCap EZ Library, biotinylated DNA oligos in solution

  1. Bead Capture: Streptavidin beads are used to pull down the complex of capture oligos and genomic DNA fragments.
  2. Washing: Unbound fragments are removed by washing.

Sequence Capture, capture probes synthesized on array:

  1. Washing: Unbound fragments are removed by washing.
  2. Target Fragment Elution: The enriched fragment pool is eluted and recovered from the array.
  1. Amplification: Enriched fragment pool is amplified by PCR.
  2. Enrichment QC: The success of enrichment is measured by qPCR at control loci.
  3. Sequencing-Ready DNA: The end product is a sequencing library enriched for target regions, ready for high throughput sequencing.

For more information on how to get trained and set up with Sequence Capture Arrays, visit our Quick Guide page.

Annotation Files

The annotation package for the NimbleGen Sequence Capture 2.1M Human Exome Array includes 6 files that provide visualization and annotation of the array design.

Download the complete 6-file set!

Note: The GFF files can be opened using SignalMap software from NimbleGen. The BED file can be displayed within the UCSC genome browser as a custom annotation track. XLS files can be opened by Microsoft Excel software.

  • 2.1M_Human_Exome.gff = There are two tracks in this .gff file. The primary_target_region track displays the exon targets, and the capture_target track displays exon targets that are actually covered by the probes. You will notice that sometimes capture target and primary target regions do not perfectly align, meaning that 1) the exon is shorter than 200bp and the target region was extended out to at least 200bp, or 2) no probes were designed to that particular region of the exon target due to repetitive sequence. Note that this is the same design file as 080904_ccds_exome_rebalfocus_HX1.gff that is part of the standard deliverable for a 2.1M Human Exome array.
  • 2.1M_Human_Exome.bed = This file contains the same information as the 2.1M_Human_Exome.gff file above, but in BED format, and is to be displayed within the UCSC Genome Browser. Note that this is the same design file as 080904_ccds_exome_rebalfocus_HX1.gff that is part of the standard deliverable for a 2.1M Human Exome array.
  • 2.1M_coding_exons_annotation.gff = There is a single track in this .gff file. Each vertical bar represents 1 human protein coding exon target. A gray bar means that there are no covered bases for that exon target. A black bar means that there is at least 1 base of coverage. When using SignalMap software, move the cursor over each exon to display the CCDS ID.
  • 2.1M_miRNA_annotation.gff = There is a single track in this .gff file. Each vertical bar represents 1 human miRNA exon target. A gray bar means that there are no covered bases for that exon target. A black bar means that there is at least 1 base of coverage. When using SignalMap software, move the cursor over each miRNA to display the miRNA registry identifier.
  • 2.1M_coding_exons_annotation.xls and 2.1M_miRNA_annotation.xls These are Excel spreadsheets that list the genes and miRNAs that were targeted by the array design. The columns are:
    • CCDS ID = The identifier number for a particular human protein coding gene.
    • miRNA REGISTRY = The miRBASE identifier for a particular human miRNA gene.
    • GENE SYMBOL = The alphanumeric identifier for a particular human protein coding gene.
    • DESCRIPTION = The full name for a particular human protein coding gene.
    • REFSEQ = The RefSeq identifier for a particular human protein coding gene.
    • UCSC GENE ID = The UCSC identifier for a particular human protein coding gene.
    • ENSEMBL = The ENSEMBL identifier for a particular human protein coding gene.
    • CHROMOSOME = Identifies on what chromosome a particular human protein coding gene or human miRNA gene resides.
    • STRAND = On what strand (+ or -) from which a particular human protein coding gene or human miRNA gene is expressed.
    • START = Coordinates where a particular human protein coding gene or human miRNA gene starts.
    • END = Coordinates where a particular human protein coding gene or human miRNA gene ends.
    • EXON COUNT = A raw count of how many exons comprise a particular human protein coding gene.
    • ARRAY COVERAGE = Percentage of target bases from a particular human protein coding gene or human miRNA gene covered by probes designed on the NimbleGen Sequence Capture array.
    • ARRAY COVERAGE W 100BP EXTENSION = Percentage of target bases from a particular human protein coding gene or human miRNA gene covered by probes designed on the NimbleGen Sequence Capture array, PLUS 100bp of padding on both sides of each probe. This is a better estimate of the final sequencing coverage, because ~ 100bp flanking sequences at both ends of each probe are typically captured and sequenced.

Download the complete 6-file set!

Reagents

NimbleGen Sequence Capture Array Hybridization and Wash Kits contain the components to perform hybridization and wash steps in sequence capture protocols using NimbleGen Sequence Capture Arrays.

Description Catalog Number Pack Size Kit Capacity Compatible Applications Ordering*
Sequence Capture Array Hybridization and Wash Kit 05853257001 1 Kit 8 Arrays Sequence Capture Arrays Buy Online Button
* Availability of products varies from country to country.

For life science research only. Not for use in diagnostic procedures. This website contains information on products which is targeted to a wide range of audiences and could contain product details or information otherwise not accessible or valid in your country. Please be aware that we do not take any responsibility for accessing such information which may not comply with any valid legal process, regulation, registration or usage in the country of your origin.