3. Results and Discussion 3.1. Array Design and Sequence Annotation Cross species microarray hybridization is a common approach used to study gene expression profiles of poorly annotated species [10,34,35]. Davey et al. [34] for example utilized the commercial Arabidopsis array (ATH-1 Affymetrix GeneChip®) and Rice array (Rice Affymetrix GeneChip®) to examine patterns of gene expression in banana under abiotic stress. They identified 2910 differentially expressed transcripts from Musa spp. in response to drought and found several transcripts that co‑localized to known rice QTLs that were previously identified through drought experiments. The cross species approach was applied due to the availability of a complete genome sequence and detailed publically-available resources. However, some reports caution that results obtained using cross species arrays may not reflect the true expression within the species under study, due to differences in transcripts homology between the species [35,36,37]. In this study, we utilized a published RNAseq dataset generated from transcript sequencing of mesocarp tissues at different stages of development (WAA) as the basis to develop the probes for a custom microarray. This microarray could be used to study the gene expression profiles in the mesocarp tissue during development. The oil palm custom mesocarp microarray consists of 95,382 probes derived from the mesocarp transcripts and 1325 standard Agilent control probes, to monitor the efficiencies of the hybridization processes. In this array, each of the mesocarp transcripts was represented by three distinct probes, each 60 bp in length. The individual probes were positioned randomly on the 105K array. Three probes per transcript were used to increase confidence in outcomes by avoiding bias towards signals produced due to non-specific binding and partial degradation of particular transcripts. A total of 31,804 unique contigs (assembled from approximately 3 million sequenced reads from mesocarp tissue), each denoted with an individual identifier, were used for microarray probe design. From the 31,804 contigs used, 49.3% of the transcripts were annotated based on homology to those in the Uniprot database (Table 1). A total of 8.1% of the transcripts utilized also had acceptable Kyoto Encyclopedia of Genes and Genomes (KEGG) ID matches. However, when further classified based on their putative functions in different pathways, only 1.9% could be classified uniquely into defined pathways using the KEGG database. Similar figures were observed in transcriptome sequencing of microalgae, as an example, in which only about 3.9% of the annotated sequences had GO matches and 1.7% were assigned Enzyme Commission (EC) numbers [38]. The results show the incompleteness of oil palm annotation if based solely on the KEGG database. Using the KEGG classifications however, we observed that the largest group (of 18% of the KEGG classified transcripts) were putatively involved in secondary metabolite biosynthesis (Figure 1) followed by those involved in ribosome biology (8%) glycolysis, pyruvate and citrate cycles (7% in each of the three categories). Only 2% of classified transcripts were annotated as being involved in fatty acid (FA) biosynthesis, starch and sucrose metabolism, for each class. In this custom array, unique un-annotated transcripts (>50% of the contigs, data not shown) were also included to provide better coverage of the expressed genes found in oil palm mesocarp tissue. microarrays-03-00263-t001_Table 1 Table 1 Oil palm transcripts annotation summary. Figure 1 Classification of transcripts in different pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. 3.2. Expression Comparisons of Fatty Acid (FA), Triacyl Glyceride (TAG) Biosynthesis, Citric Acid Cycle (TCA) and Glycolysis Genes between Custom Oil Palm Mesocarp, Arabidopsis and Rice Microarrays at 16 Week After Anthesis (WAA) In the custom oil palm mesocarp microarray, probes designed to detect FA and TAG biosynthesis genes during mesocarp development were included. To compare the detection efficiencies of FA and TAG biosynthesis genes on the different microarray platforms, we compared the expression levels through signals produced from microarray hybridization in mesocarp tissue at 16 WAA, at which point mesocarp oil accumulation in this tissue is entering the exponential phase [39]. In the Arabidopsis and rice microarrays, the expression signals produced at 16 WAA were similar across the selected FA and TAG genes (Figure 2A), however up to 3-fold variation was observed in the oil palm microarray hybridization signals. The statistical test (Mann-Whitney test) showed that the oil palm custom array produced significantly higher (p-value < 0.01) probe signal intensity in all the FA and TAG probes when compared to signals produced using Arabidopsis microarray. Comparing the rice microarray and oil palm custom array, we also found a similar situation to the Arabidopsis-oil palm array comparison. The Mann-Whitney test also showed significant differences overall in the probe signal levels of transcript that code for FA and TAG genes using the oil palm microarray as compared to the rice microarray platforms, at p-value < 0.05. This was with the exception of the probe that coded for Carboxyltransferase α-subunit of heteromeric acetyl-CoA carboxylase (ACC CT-α), where no significant differences were seen between rice and oil palm arrays. (p-value, 0.66). Figure 2 Expression signal comparison of selected fatty acid (FA) and triacylglycerols (TAGs). (A) tricarboxylic acid cycle (TCA) and glycolysis (B) genes at 16 Weeks After Anthesis (WAA) using different microarray platforms, Rice, Arabidopsis and Oil Palm. Signals were compared using a Mann-Whitney test at p-value < 0.05. FAD 2, Oleate desaturase; PK, Pyruvate kinase; LAC, Long-chain acyl-CoA synthetase; FAT A, Acyl-ACP thioesterase A; KAS I, Ketoacyl-ACP synthase I; KAS III, Ketoacyl-ACP synthase III; KAS II, Ketoacyl-ACP synthase II; SAD, Stearoyl-ACP desaturase; FAT B, Acyl-ACP thioesterase B; CPT, Diacylglycerol cholinephosphotransferase; DGAT 2, Acyl-CoA: diacylglycerol acyltransferase 2; LPCAT, 1-acyl glycerol-3-phosphocholine acyltransferase; KAR, Ketoacyl-ACP reductase; ACC CT-α, Carboxyltransferase α-subunit of acetyl-CoA carboxylase; HAD, hydroxyacyl-ACP dehydrase; WRI1, EAR, Enoyl-ACP reductase; PDAT, Phospholipid:diacylglycerol acyltransferase; GPAT, glycerol-3-phosphate acyltransferase; MAT, Malonyl-CoA:ACP malonyltransferase; LPAAT, Lyso PA acyltransferase; DGAT 1, Acyl-CoA:diacylglycerol acyltransferase 1; PDH (DHLAT), Dihydrolipoamide acetyltransferase; SCS, Succinyl coenzyme A synthetase; PGK, Phosphoglycerate kinase; IDH, Isocitrate dehydrogenase; PFK-1, Phosphofructokinase 1; CS, Citrate synthase; PGI, phosphoglucose isomerase; ALDOA, Fructose-bisphosphate aldolase; TPI, Triosephosphate isomerase; ENO 1, enolase 1; ME, Malic Enzyme; ACLY, ATP Citrate Lyase; GAPDH, Glyceraldehyde-3-phosphate dehydrogenase; MDH, Malate dehydrogenase; FH, Fumarase; HK, Hexokinase. Similar results were also observed when comparing the glycolysis and TCA pathway genes (Figure 2B). Most of the genes exhibit higher expression signals in the oil palm microarray dataset and show significant differences in their expression (p-value < 0.01) when compared to the Arabidopsis array, with the exception of the transcript that coded for fumarase (p-value, 0.1689). In the Arabidopsis microarray, only the transcript coding for enolase 1 had significantly higher expression signal (p-value, 0.0004) compared to oil palm microarray. Comparisons between rice and oil palm arrays yielded similar results, with most of the genes showing significantly higher expression signal using the oil palm array, with the exception of the transcripts for hexokinase, fumarase, glyceraldehydes-3-phosphate dehydrogenase (GAPDH), and ATP citrate lyase (p-value > 0.05). Probe signal comparisons identified several genes with expression signals with up to 3-fold difference between Arabidopsis and the oil palm microarray, similarly for the rice microarray. The exceptions were aconitase and succinyl-CoA synthetase alpha subunit; these two transcriptsexhibited much higher signal in the rice microarray compared to the Arabidopsis microarray. For genes in FA, TAG, TCA and glycolysis pathways (Figure 2), the 16 biological samples used in the microarray showed consistent differences between replicates. This argues for a lack of dynamic detection range in the Arabidopsis and rice microarrays, compared to the oil palm microarray even for genes in conserved metabolic pathways—A reflection of the lack of homology between the Arabidopsis and rice array probes used here compared with oil palm FA, TAG, glycolysis and TCA genes. 3.3. FA and TAG Biosynthesis Profiling in Oil Palm Mesocarp Development (12 WAA–22 WAA) Using the Oil Palm Mesocarp Microarray To study the robustness of the oil palm mesocarp microarray, we further compared the expression profile of the selected FA and TAG biosynthesis genes throughout mesocarp development with published datasets derived from transcriptome sequencing at similar development stages. From the expression profiles of the selected genes (Figure 3), the majority of probes exhibited an increase in expression signal throughout mesocarp development. This is in concordance with Bourgis et al. [4] where they reported almost all the transcripts related to FA biosynthesis continue to increase until the end of oil accumulation. From our correlation study, the FA and TAGs genes show reasonable correlation (Pearson correlation) with R2 = 0.569 and p-value < 0.01 (Figure 4). Individually, 56% of these genes assessed with the custom oil palm microarray show high concordance to the published dataset, with R2 > 0.9 and p-value < 0.05 (Table 2). Nonetheless, there are a few genes that exhibited differences in expression profiles derived using this microarray compared to the published data [4,6] such as LACS, SAD, MAT, PDH(DHLAT), CPT, GPAT, PDAT, DGAT 1, WRI1 and LPAAT (Figure 3). The differences could be a reflection of how the respective experiments were carried out and also differences in the technology used to assess the transcripts (hybridisation versus library sequencing). However, when we compared the WRI1 expression profile from the oil palm microarray here to Bourgis et al. [4] and to Tranbarger et al. [6] we observed similarity between these three different studies (Figure 5). We observed that the expression pattern of WRI1 in Bourgis et al. [4] (Figure 5A) and our microarray (Figure 5C) are similar at initial stages (12 WAA to 16 WAA) but show different trends of expression at the end of the maturation stage. However, the expression trend of WRI1 in our microarray data is similar to that reported by Tranbarger et al. [6] (Figure 5B) with an expression peak at 120 DAP (~17 WAA). The expression of WRI1 then declines as the mesocarp develops towards the maturation stage. microarrays-03-00263-t002_Table 2 Table 2 Pearson correlation of expression changes between microarray and published data [4]. * Significant at p-value < 0.05. Figure 3 Expression change comparisons of selected FA genes between microarray and Bourgis et al. [4] throughout mesocarp development. Figure 4 Coefficient of correlation of transcriptome sequencing [4] and microarray data for FA genes at 16 WAA. R2 = 0.569, p-value < 0.01. Figure 5 Expression comparisons of WRI1 between Bourgis et al. [4] (A), Tranbarger et al. [6] (B) and oil palm microarray (C). Differences in sampling methods at different maturation stages and potentially differences in the genetic backgrounds used maycontribute to the observed differences in expression profiles of these genes. In this study, mesocarp samples of each maturation stage were harvested from all 16 selected palms, compared to the published studies where mesocarp samples of different maturation stages were harvested from different palms. We also found that the expression of specific genes can be genotype dependent (data not shown). In the design of this microarray, we also included paralog sequences from our mesocarp transcriptome sequencing that appeared in the mesocarp tissue, focusing on the genes involved in the FA biosynthesis pathway as reported in Dussert et al. [5]. From the comparison of paralogs of several FA genes (ACC, KAS I, KAS III and SAD), we observed that the level of expression of these paralogs is different and that these correspond to different enzymes (Figure 6). The paralogs corresponding to KAS I are different in expression level at the initial stages (12–14 WAA) but similar in their expression level from 16 WAA until maturation. In the case of KAS III, the paralog 18883 was expressed at higher levels than 26680 at the initial stages of mesocarp development (12–14 WAA) but the relative expression levels switched for these two paralogs at later stages of oil palm mesocarp development (18–22 WAA). However, in some cases paralogs showed only low levels of expression throughout mesocarp development as can be observed in the case of ACC and SAD. From the expression pattern of different paralogs (although within the same tissue, mesocarp) it can be suggested that these paralogs may play different roles in the FA biosynthesis pathway [5]. The ability to design microarray probes to detect paralogs, based on the extensive transcript sequencing potentially allows the detection of functionally different transcripts. Overall, it was observed that the expression levels of individual genes appeared more consistent using the oil palm microarray than reported using the 454 sequencing-based publications [4,6], despite the fact that 454 sequencing is known to be sensitive [40]. However, microarray experiments are based on the hybridization efficiencies of the probes and this will affect the level of signal produced. The probe efficiencies may be affected by sequence differences between transcript and probe sequence. Furthermore, the position of differences in probe sequences may also affect hybridization efficiencies and eventually signal intensity [41]. While being a disadvantage of microarray approaches compared to transcriptome sequencing, the use of siblings in this experiment combined with the 3 × 60 bp probe design should minimize the effect. The development of a custom oil palm mesocarp microarray is also justified by the increased range of response observed when genes involved in lipid biosynthesis were compared with the Arabidopsis microarray. While Arabidopsis is a highly characterized model plant with a full genome sequence that has been highly annotated, the current study suggests that it may have limited use as a cross-species microarray platform for oil palm without validation and cross-species sequence comparisons, even for metabolic pathways that are functionally conserved. Figure 6 Expression of paralogs of four FA genes throughout mesocarp development. 3.4. Validation by Quantitative Real-time PCR (qPCR) To validate the expression patterns observed using the Oil Palm mesocarp microarray platform, qPCR was performed to compare the expression level of selected candidates. qPCR serves as a validation tool due to its greater precision and better dynamic range compared to microarray [42]. A total of 12 candidates were randomly selected from the oil palm microarray experiment based on three categories of expression: high, medium and low. The selected genes included those involved in various biological pathways, including some transcripts with unknown function. Primer sets were designed based on published transcriptome sequencing data. The efficiencies of the primer sets were determined and the specificity of the primer sets were analyzed based on dissociation curves (Supplementary Data—qPCR primer efficiency). The qPCR results were normalized using in-house validated oil palm reference genes [32] and show good correlation (Pearson correlation) with the expression profiles in the microarray experiment (R2 = 0.729, p-value < 0.01). The expression profiles generally agreed with the results obtained from oil palm microarray hybridization (Figure 7) with increasing and decreasing trends over the time points in a similar pattern to the microarray expression profile over the same time points. In some cases, the signal magnitude of the microarray experiment was different from the qPCR results, with qPCR expression showing more dramatic changes between time points. This possibly reflects the greater dynamic range of detection [42,43] of qPCR and microarray platforms, which have been reported to generally underestimate the relative changes of expression level in samples tested [44]. However, the overall correlation observed between microarray and qPCR analysis indicates the utility of the oil palm custom mesocarp array to identify expression patterns and differential gene expression between samples.