3.1. Array Design and Sequence Annotation Cross species microarray hybridization is a common approach used to study gene expression profiles of poorly annotated species [10,34,35]. Davey et al. [34] for example utilized the commercial Arabidopsis array (ATH-1 Affymetrix GeneChip®) and Rice array (Rice Affymetrix GeneChip®) to examine patterns of gene expression in banana under abiotic stress. They identified 2910 differentially expressed transcripts from Musa spp. in response to drought and found several transcripts that co‑localized to known rice QTLs that were previously identified through drought experiments. The cross species approach was applied due to the availability of a complete genome sequence and detailed publically-available resources. However, some reports caution that results obtained using cross species arrays may not reflect the true expression within the species under study, due to differences in transcripts homology between the species [35,36,37]. In this study, we utilized a published RNAseq dataset generated from transcript sequencing of mesocarp tissues at different stages of development (WAA) as the basis to develop the probes for a custom microarray. This microarray could be used to study the gene expression profiles in the mesocarp tissue during development. The oil palm custom mesocarp microarray consists of 95,382 probes derived from the mesocarp transcripts and 1325 standard Agilent control probes, to monitor the efficiencies of the hybridization processes. In this array, each of the mesocarp transcripts was represented by three distinct probes, each 60 bp in length. The individual probes were positioned randomly on the 105K array. Three probes per transcript were used to increase confidence in outcomes by avoiding bias towards signals produced due to non-specific binding and partial degradation of particular transcripts. A total of 31,804 unique contigs (assembled from approximately 3 million sequenced reads from mesocarp tissue), each denoted with an individual identifier, were used for microarray probe design. From the 31,804 contigs used, 49.3% of the transcripts were annotated based on homology to those in the Uniprot database (Table 1). A total of 8.1% of the transcripts utilized also had acceptable Kyoto Encyclopedia of Genes and Genomes (KEGG) ID matches. However, when further classified based on their putative functions in different pathways, only 1.9% could be classified uniquely into defined pathways using the KEGG database. Similar figures were observed in transcriptome sequencing of microalgae, as an example, in which only about 3.9% of the annotated sequences had GO matches and 1.7% were assigned Enzyme Commission (EC) numbers [38]. The results show the incompleteness of oil palm annotation if based solely on the KEGG database. Using the KEGG classifications however, we observed that the largest group (of 18% of the KEGG classified transcripts) were putatively involved in secondary metabolite biosynthesis (Figure 1) followed by those involved in ribosome biology (8%) glycolysis, pyruvate and citrate cycles (7% in each of the three categories). Only 2% of classified transcripts were annotated as being involved in fatty acid (FA) biosynthesis, starch and sucrose metabolism, for each class. In this custom array, unique un-annotated transcripts (>50% of the contigs, data not shown) were also included to provide better coverage of the expressed genes found in oil palm mesocarp tissue. microarrays-03-00263-t001_Table 1 Table 1 Oil palm transcripts annotation summary. Figure 1 Classification of transcripts in different pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. 3