PMC:3475479 / 6171-17129
Annnotations
{"target":"https://pubannotation.org/docs/sourcedb/PMC/sourceid/3475479","sourcedb":"PMC","sourceid":"3475479","source_url":"https://www.ncbi.nlm.nih.gov/pmc/3475479","text":"Application of NGS to Genome Research\n\nNovel whole genome de novo assembly\nMore than 11,000 sequencing projects, including targeted projects, were reported on the Genome Online Database (GOLD, http://www.genomesonline.org) in early 2012. Now, more than 3,000 genome projects have been completed on the diverse genome species, and more than 90% of completed projects were bacterial genome sequencing. The greatest bacterial genome sequencing was performed with 454 pyrosequencing because of the available largest long read sequencing, useful for de novo assembly of novel genome sequencing. The official depth of the deep sequencing strategy of 454 pyrosequencing technology for whole bacterial genome sequencing for de novo assembly in novel genome sequencing is at least 15-20× in depth of the estimated genome size [9-13]. However, Li et al. [3] reported that 6-10× sequencing in qualified runs with 500-bp reads would be enough for de novo assemblies from 1,480 prokaryote genomes with \u003e98% genome coverage, \u003c100 contigs with N50, and size \u003e100 kb. Recently, prokaryote whole genome sequencing using 101 bp paired-end read data from Illumina/Solexa systems was used for de novo assembly and resequencing. For example, a Bacillus subtilis subspecies genome sequence was generated by using the short read sequence from Illumina/Solexa and assembled with the Velvet program [14]. In this case, the genome assembly was completed, based on the reference genome for ordering the numerous contigs derived from de novo assembly. Even though numerous contigs assembled with Illumina/Solexa data were produced in the eukaryotic genome, a few drafts for the assembled genome sequence were reported, except for the giant panda genome [15], which was covered with assembled contigs (2.25 Gb), covering approximately 94% of the expected whole genome. Another example was the woodland strawberry genome (240 Mb) [16] that was sequenced to 39× depth of the genome, assembled de novo, and anchored to the linkage map of seven pseudochromosomes.\nThe genome sequence could be associated with the predicted genes with transcriptome sequence data. An ideal method for cost-effective novel genome sequencing using NGS is de novo assembly with diverse shotgun fragment end sequencing data of multiplat systems (Fig. 1). The first strategy of novel genome DNA sequencing is sequencing the genomic DNA for contig and scaffold construction after randomly sheared shotgun single read-end or paired-end read DNA sequencing using Roche/454 or Illumina/Solexa with information on how to assemble with the NGS data using variable assembly software. Recently, a catfish genome was sequenced with multiplatform Roche/454 and Illumina/Solexa technology and assembled with an effective combination of low coverage depth of 18× Roche/454 and 70× Illumina/Solexa data using 3 assembly softwares - Newbler software to the 454 reads, Velvet assembler to the Illumina read, and MIRA assembler for final assembly of contigs and singletons derived from initial assembled data - resulting in 193 contigs with an N50 value of 13,123 bp [2]. In an additional multiplatform data assembly of a 40-Mb eukaryotic genome of the fungus Sordaria macrospra, a combination sequence of 85-fold coverage of Illumina/Solexa and 10-fold coverage by Roche/454 sequencing was assembled to a 40-Mb draft version (N50 of 117 kb) with the Velvet assembler as a reference of a model organism for fungal morphogenesis [17]. In the recent effective assembly methods reported, combinations of the multiplatform sequence are shown as successful novel genome assembly using variable assembly strategy pipelines. Comparing the pipeline of assembly strategy, we suggest an effective integrated pipeline in which data are filtered to remove low-quality and short-read initial assemblies using variable software and then compared to contigs, hybrid contigs using MIRA assembler, and finally contig orders using SSPACE software (http://www.baseclear.com/dna-sequencing/data-analysis/) [18] for scaffold construction through de novo assembly of novel genome sequencing (Fig. 2). According to the comparison of several ways of de novo assembly, we suggest using both DNA sequences from multiplatform NGS with at least 2× and 30× depth sequences of genome coverage using Roche/454 and Illumina/Solexa, respectively, and doing hybrid assembly for cost-effective novel genome sequencing.\n\nSNP discovery and genotyping with resequencing\nResequencing of genomic regions or target genes of interest in a phenotype is the first step in the detection of DNA variations associated with the gene regulation. The discovery of single-nucleotide polymorphisms (SNPs) including insertion/deletions (indels), with high-throughput data is useful to study genetic variation, comparative genomics, linkage map, and genomic selection for breeding value with DNA variation. Many geneticists for biological and genome studies of microbial, plant, animal, and human genomes have effectively used NGS whole-genome resequencing data to use in variable research fields, such as bacterial evolution [19], genomewide analysis of mutagenesis of Escherichia coli strains [20], comparative genomics of Streptococcus suis of swine pathogen [21], genomic variation effects on phenotype and gene regulation in mouse [22], evolution of plant [23], and comparison of genetic variations on the targeted enrichment [24]. The platforms of resequencing projects have used Illumina/Solexa of short read lengths to align with the reference sequence to discover DNA variations between compared related species' sequences. Because of rare occurrence of SNPs in most species, it is important to identify high-accuracy data to discover DNA variations according to coverage depth using MAQ (http://maq.sourceforge.net/maq-man.shtml) [25] and CLC software (http://www.clcbio.com). The public protocol of covering depth to discover SNPs and indels on the heterogeneous genome requires at least 30× of the reference genome, while about 10× depth of coverage is enough for DNA variation study of homogeneous genomes. Of course, high coverage of depth provides high-quality data in SNP detection on the reference mapping (Fig. 3). However, short read lengths of 35 bp or 100 bp show enough to map on the reference sequence using the MAQ software and CLC software in the genome, including short repeated block regions. But, geneticists still require long-read sequencing data to distinguish repeated block regions, like paralogous regions derived from gene duplication. MAQ software provides a consensus sequence of the genotype sequenced of short read lengths with aligned raw reads to the reference sequence. CLC software checks accuracy by counting reads of DNA variations of each position. Recently, a novel application of pattern recognition for accurate DNA variations was discovered in the complexity of the genomic region using high-throughput data in a Caucasian population [26]. They used three independent datasets with Sanger sequencing and Affymetrix and Illumina microarrays to validate SNPs and indels of a clinical target region, FKBP5. Therefore, it is necessary for multiplatform systems to validate DNA variations in the specific complexity of the genome region.\n\nExpression profiling\nGene expression profiling is a measurement of the regulation of a transcriptome from the whole genome in the field of molecular biology. A conventional method to measure the relative activity of target genes is DNA microarray technology, which estimates expressed genes with the signals of hybridization of target genes (cDNA from mRNA) on the synthesized oligonucleotides [27]. The technology is still used for functional genomics in the wide era, including medicine, clinic, plant, and agricultural biotechnology [28-30]. In addition, microarray technology is also used in the comparative study of proteomics and expression, measuring the level of extracellular matrix protein [30]. Since NGS technology was developed in 2005, the transcriptome of novel whole genomes could be identified with massive parallel mRNA sequencing using Roche/454 and Illumina/Solexa [31-36]. The Roche/454 system is more useful for gaining novel gene discovery of novel species' genomes for long read sequencing [37, 38]. Otherwise, Illumina/Solexa is being used to profile the expression of known genes with mapping short read sequences to the known reference genes [39, 40]. In that case, rare expressed genes and novel genes could be identified with high-throughput expressed sequence tag sequences using Illumina/Solexa. Also, it is useful to find significant tissue-specific expression biases with comparison of transcript data [22]. Now, the hybrid mRNA sequence from Rohce/454 and Illumina/Solexa is more powerful for finding novel genes through de novo assembly in any whole-genome species.\nThe hybrid sequence data of 20× and 50× coverage of the estimated transcriptome sequence from Roche/454 and Illumina/Solexa, respectively, is effective in creating novel expressed reference sequences, while short-read Illumina/Solexa data are cost-efficient on expression quantification information for comparing exposed samples and natural phenotype samples through mapping to the reference genes (Fig. 4). Only and average 30× coverage of transcriptome depth of short-read sequences of Illumina/Solexa is enough to check expression quantification, compared to reference expressed sequence tag sequences. The expressed information could be different, depending on the software using CAP3, MIRA, Newbler, SeqMan, and CLC. Therefore, the results should be compared according to variable program options to define robust expression profiling [41]. To date, a powerful tool of ChIP-on-chip is used for understanding gene transcription regulation. Thus, two-channel microarray technology of a combination of chromatin immunoprecipitation could be used for genomewide mapping of binding sites of DNA-interacting proteins [29]. In any NGS application, the transcriptome expression information would be more useful than complete genome information research with the lowest sequencing budget for biologists to better understand gene regulation of related genetic phenotypes with the in silico method. Of in silico methods, conserved miRNA and novel miRNA discovery is available on the massive miRNAnome data in any species. Specially, the target genes of miRNA discovered could be robust information to approach genome biology studies. Transcriptome assembly is smaller than genome assembly and thus should be more computationally tractable but is often harder, as individual contigs can often have highly variable read coverages. Comparing single assemblers, Newbler 2.5 performed the best on our trial dataset, but other assemblers were closely comparable. Combining different optimal assemblies from different programs, however, gives a more credible final product, and this strategy is recommended [41].","divisions":[{"label":"Title","span":{"begin":0,"end":37}},{"label":"Section","span":{"begin":39,"end":4411}},{"label":"Title","span":{"begin":39,"end":74}},{"label":"Section","span":{"begin":4413,"end":7256}},{"label":"Title","span":{"begin":4413,"end":4459}},{"label":"Title","span":{"begin":7258,"end":7278}}],"tracks":[{"project":"2_test","denotations":[{"id":"23105922-22260654-44845636","span":{"begin":818,"end":819},"obj":"22260654"},{"id":"23105922-22053731-44845636","span":{"begin":818,"end":819},"obj":"22053731"},{"id":"23105922-21075933-44845636","span":{"begin":818,"end":819},"obj":"21075933"},{"id":"23105922-21478339-44845636","span":{"begin":818,"end":819},"obj":"21478339"},{"id":"23105922-21994929-44845636","span":{"begin":818,"end":819},"obj":"21994929"},{"id":"23105922-22192914-44845637","span":{"begin":845,"end":846},"obj":"22192914"},{"id":"23105922-21183663-44845638","span":{"begin":1375,"end":1377},"obj":"21183663"},{"id":"23105922-20010809-44845639","span":{"begin":1726,"end":1728},"obj":"20010809"},{"id":"23105922-21186353-44845640","span":{"begin":1901,"end":1903},"obj":"21186353"},{"id":"23105922-22192763-44845641","span":{"begin":3096,"end":3097},"obj":"22192763"},{"id":"23105922-20386741-44845642","span":{"begin":3457,"end":3459},"obj":"20386741"},{"id":"23105922-21149342-44845643","span":{"begin":4015,"end":4017},"obj":"21149342"},{"id":"23105922-22081229-44845644","span":{"begin":5101,"end":5103},"obj":"22081229"},{"id":"23105922-22080106-44845645","span":{"begin":5170,"end":5172},"obj":"22080106"},{"id":"23105922-22026465-44845646","span":{"begin":5237,"end":5239},"obj":"22026465"},{"id":"23105922-21921910-44845647","span":{"begin":5311,"end":5313},"obj":"21921910"},{"id":"23105922-21878562-44845648","span":{"begin":5336,"end":5338},"obj":"21878562"},{"id":"23105922-21876176-44845649","span":{"begin":5406,"end":5408},"obj":"21876176"},{"id":"23105922-18714091-44845650","span":{"begin":5815,"end":5817},"obj":"18714091"},{"id":"23105922-21917492-44845651","span":{"begin":6959,"end":6961},"obj":"21917492"},{"id":"23105922-1579459-44845652","span":{"begin":7653,"end":7655},"obj":"1579459"},{"id":"23105922-22280841-44845653","span":{"begin":7795,"end":7797},"obj":"22280841"},{"id":"23105922-22276688-44845653","span":{"begin":7795,"end":7797},"obj":"22276688"},{"id":"23105922-21573704-44845653","span":{"begin":7795,"end":7797},"obj":"21573704"},{"id":"23105922-21573704-44845654","span":{"begin":7959,"end":7961},"obj":"21573704"},{"id":"23105922-20359325-44845655","span":{"begin":8144,"end":8146},"obj":"20359325"},{"id":"23105922-20435767-44845655","span":{"begin":8144,"end":8146},"obj":"20435767"},{"id":"23105922-20614011-44845655","span":{"begin":8144,"end":8146},"obj":"20614011"},{"id":"23105922-21619941-44845655","span":{"begin":8144,"end":8146},"obj":"21619941"},{"id":"23105922-21976711-44845655","span":{"begin":8144,"end":8146},"obj":"21976711"},{"id":"23105922-20615900-44845655","span":{"begin":8144,"end":8146},"obj":"20615900"},{"id":"23105922-21771864-44845656","span":{"begin":8273,"end":8275},"obj":"21771864"},{"id":"23105922-21749684-44845657","span":{"begin":8277,"end":8279},"obj":"21749684"},{"id":"23105922-20700453-44845658","span":{"begin":8428,"end":8430},"obj":"20700453"},{"id":"23105922-21176179-44845659","span":{"begin":8432,"end":8434},"obj":"21176179"},{"id":"23105922-21921910-44845660","span":{"begin":8694,"end":8696},"obj":"21921910"},{"id":"23105922-20950480-44845661","span":{"begin":9700,"end":9702},"obj":"20950480"},{"id":"23105922-22276688-44845662","span":{"begin":9976,"end":9978},"obj":"22276688"},{"id":"23105922-20950480-44845663","span":{"begin":10954,"end":10956},"obj":"20950480"}],"attributes":[{"subj":"23105922-22260654-44845636","pred":"source","obj":"2_test"},{"subj":"23105922-22053731-44845636","pred":"source","obj":"2_test"},{"subj":"23105922-21075933-44845636","pred":"source","obj":"2_test"},{"subj":"23105922-21478339-44845636","pred":"source","obj":"2_test"},{"subj":"23105922-21994929-44845636","pred":"source","obj":"2_test"},{"subj":"23105922-22192914-44845637","pred":"source","obj":"2_test"},{"subj":"23105922-21183663-44845638","pred":"source","obj":"2_test"},{"subj":"23105922-20010809-44845639","pred":"source","obj":"2_test"},{"subj":"23105922-21186353-44845640","pred":"source","obj":"2_test"},{"subj":"23105922-22192763-44845641","pred":"source","obj":"2_test"},{"subj":"23105922-20386741-44845642","pred":"source","obj":"2_test"},{"subj":"23105922-21149342-44845643","pred":"source","obj":"2_test"},{"subj":"23105922-22081229-44845644","pred":"source","obj":"2_test"},{"subj":"23105922-22080106-44845645","pred":"source","obj":"2_test"},{"subj":"23105922-22026465-44845646","pred":"source","obj":"2_test"},{"subj":"23105922-21921910-44845647","pred":"source","obj":"2_test"},{"subj":"23105922-21878562-44845648","pred":"source","obj":"2_test"},{"subj":"23105922-21876176-44845649","pred":"source","obj":"2_test"},{"subj":"23105922-18714091-44845650","pred":"source","obj":"2_test"},{"subj":"23105922-21917492-44845651","pred":"source","obj":"2_test"},{"subj":"23105922-1579459-44845652","pred":"source","obj":"2_test"},{"subj":"23105922-22280841-44845653","pred":"source","obj":"2_test"},{"subj":"23105922-22276688-44845653","pred":"source","obj":"2_test"},{"subj":"23105922-21573704-44845653","pred":"source","obj":"2_test"},{"subj":"23105922-21573704-44845654","pred":"source","obj":"2_test"},{"subj":"23105922-20359325-44845655","pred":"source","obj":"2_test"},{"subj":"23105922-20435767-44845655","pred":"source","obj":"2_test"},{"subj":"23105922-20614011-44845655","pred":"source","obj":"2_test"},{"subj":"23105922-21619941-44845655","pred":"source","obj":"2_test"},{"subj":"23105922-21976711-44845655","pred":"source","obj":"2_test"},{"subj":"23105922-20615900-44845655","pred":"source","obj":"2_test"},{"subj":"23105922-21771864-44845656","pred":"source","obj":"2_test"},{"subj":"23105922-21749684-44845657","pred":"source","obj":"2_test"},{"subj":"23105922-20700453-44845658","pred":"source","obj":"2_test"},{"subj":"23105922-21176179-44845659","pred":"source","obj":"2_test"},{"subj":"23105922-21921910-44845660","pred":"source","obj":"2_test"},{"subj":"23105922-20950480-44845661","pred":"source","obj":"2_test"},{"subj":"23105922-22276688-44845662","pred":"source","obj":"2_test"},{"subj":"23105922-20950480-44845663","pred":"source","obj":"2_test"}]}],"config":{"attribute types":[{"pred":"source","value type":"selection","values":[{"id":"2_test","color":"#ec93cc","default":true}]}]}}