Transcriptome assembly and advanced data analysis
A custom Perl program was used to remove low-quality sequences from raw data sequences. The high quality reads were then assembled using Velvet (Velvet_1.1.07; kmer =57) and Oases to construct unique consensus sequences [33,34]. The trimmed Solexa transcriptome reads were mapped onto the unique consensus sequences using SOAP2 [35]. A custom Perl script was used to process the mapping results and generate the gene expression profile.
Nt and Nr: Unigenes were compared with the NCBI non-redundant nucleotide database (Nt, as of March 4th, 2012) and non-redundant protein database (Nr, as of March 4th, 2012) using BLASTn and BLASTx, respectively, with the same E-value cutoffs <1e-5 [36]. Swiss-Prot: Unigenes were identified by sequence similarity comparison against the Swiss-Prot database (Swiss-Prot downloaded from European Bioinformatics Institute as of March 4th, 2012) with BLAST at E values <1e-10 [37]. COG: Unigenes were assigned a functional annotation by sequence similarity comparison against the COG protein database using BLAST at E values <1e-10 [38,39]. A custom Perl script was used to assign a functional class to each unigene. KEGG: Unigenes were first compared with the Kyoto Encyclopedia of Genes and Genomes database (KEGG, release 58) using BLASTX at E values <1e-10 [40]. A Perl script was used to retrieve KEGG Orthology (KO) information from the BLAST result and then establish pathway associations between unigenes and the database. Interpro and gene ontology: InterPro domains were annotated using InterProScan Release 27.0, and functional assignments were mapped onto the GO structure [41]. WEGO was used to perform GO classifications and construct the GO tree [42].