Association of seed transcriptome with embryo morphology in developing Arabidopsis seeds Using the raw intensity data generated by AtGenExpress for a global gene expression atlas throughout the Arabidopsis life cycle [35], we performed a detailed analysis of gene expression pertaining to seed storage reserve accumulation during the eight stages of seed development, ranging from globular embryo to mature embryo stages (Table 1). Of the nearly 24,000 genes represented on the Affymetrix GeneChip ATH1 genome array, we estimated that approximately 12,353 genes (or ~54%) were expressed in at least one of the eight development stages. Our analysis took into account the fact that certain genes might be transiently expressed at only one stage during seed development. The relatively high log2 intensity value of 6.0 was chosen as the threshold to focus on the genes with at least a modest level of expression. The global transcriptional activity in the developing Arabidopsis seed is higher than in the leaf, lower than in the flower, and comparable to that in the apex, root or stem (data not shown). Table 1 Arabidopsis developing seed samples used for AtGenExpress microarray experiments. The descriptions of samples used for microarray experiments in AtGenExpress [35] were obtained from the TAIR where the raw data files were retrieved [71]. The development stages for these seed samples range from four to 12 days after pollination, encompassing the accumulation phase of both oils and storage proteins [1]. To examine the overall transcriptome changes across the eight seed development stages, we performed a principal component analysis (PCA) in the 'Sample' space, and the results indicated that the global transcriptional program changes constantly during seed maturation (Figure 1). In PCA, the first principal component (i.e., development stage) was estimated to explain ~83% of variance in the seed transcriptome, indicating that embryogenesis is the predominant cause for the substantial variation observed in the transcript population. The differences in the global gene expression patterns among the eight developing stages were cross validated by a global association test [51], showing that the seed transcriptome varied across the eight developmental stages in a statistically significant manner (P < 0.0001). The presence of siliques in the young seeds (S3 to S5; Table 1) may have had an effect on global transcript profiles in the seeds of earlier development stages, but its minor effect cannot be dissected from that of seed development under the experimental design in [35]. Additionally, Figure 1 shows that each stage has a distinct transcriptome signature that generally corresponds to its seed development stage defined by the embryo's morphology. For instance, as shown in Figure 1, the globular embryo stage (with three replicates) grouped tightly, the two samples from the bilateral stage clustered together but separately from other stages, and in general, samples from the expanded cotyledon stage and the mature embryo stage also clustered corresponding to their morphological stages, respectively. The transcriptome signature for one expanded cotyledon stage (with an asterisk in Figure 1), however, was closer to the two samples of the mature cotyledon stage, rather than the expanded cotyledon stage defined by embryo morphology. This result suggests that staging of seed development based on the embryo's morphological shape alone may not necessarily reflect the transcriptome state in the seed, which is attributable to the fact that molecular events, such as gene expression, occur prior to morphological changes. Consistent with the highly dynamic landscape in global gene expression, our analysis on individual genes using the method in [52] indicated that nearly all the genes expressed in developing Arabidopsis seeds are differentially transcribed under a stringent false discovery rate (FDR) threshold of 0.01 (data not shown). This lack of stably expressed genes with adequate transcript abundance brings into focus the challenge of determining reference genes that can be used for normalization in quantifying mRNAs in developing seeds [53]. In summary, this analysis demonstrates that the transcriptional program is subject to constant alterations during seed development as many other studies have shown, suggesting its tight regulation at the transcriptional level. Figure 1 The transcriptome dynamics during Arabidopsis seed development. The normalized, log2-transformed expression data for the 24 samples were subjected to principal component analysis (PCA) using the R prcomp function [83]. PC1 and PC2 are the first two principal components in the dataset. Different symbols and colours shown at the bottom of the figure were used for different seed developmental stages to show the relationship between molecular and morphological phenotypes. As in Table 1, the different samples are as follows: S3: C globular stage; S4: D bilateral stage; S5: D bilateral stage; S6: E expanded cotyledon stage; S7: E expanded cotyledon stage; S8: E expanded cotyledon stage; S9: F mature embryo stage; S10: F mature embryo stage. C