PMC:3126783 / 9843-36909 JSONTXT

Annnotations TAB JSON ListView MergeView

    2_test

    {"project":"2_test","denotations":[{"id":"21635767-15806101-10069407","span":{"begin":242,"end":244},"obj":"15806101"},{"id":"21635767-15806101-10069408","span":{"begin":1297,"end":1299},"obj":"15806101"},{"id":"21635767-17986450-10069409","span":{"begin":1370,"end":1372},"obj":"17986450"},{"id":"21635767-19136270-10069410","span":{"begin":1538,"end":1539},"obj":"19136270"},{"id":"21635767-14693814-10069411","span":{"begin":2220,"end":2222},"obj":"14693814"},{"id":"21635767-15806101-10069412","span":{"begin":2624,"end":2626},"obj":"15806101"},{"id":"21635767-15513985-10069413","span":{"begin":3810,"end":3812},"obj":"15513985"},{"id":"21635767-20522329-10069414","span":{"begin":4215,"end":4217},"obj":"20522329"},{"id":"21635767-19797678-10069415","span":{"begin":6312,"end":6314},"obj":"19797678"},{"id":"21635767-16078370-10069416","span":{"begin":9041,"end":9043},"obj":"16078370"},{"id":"21635767-9623998-10069417","span":{"begin":9619,"end":9621},"obj":"9623998"},{"id":"21635767-16646834-10069417","span":{"begin":9619,"end":9621},"obj":"16646834"},{"id":"21635767-16254242-10069417","span":{"begin":9619,"end":9621},"obj":"16254242"},{"id":"21635767-19213908-10069418","span":{"begin":11000,"end":11002},"obj":"19213908"},{"id":"21635767-19797678-10069419","span":{"begin":11148,"end":11150},"obj":"19797678"},{"id":"21635767-9623998-10069420","span":{"begin":11334,"end":11336},"obj":"9623998"},{"id":"21635767-16646834-10069421","span":{"begin":11569,"end":11571},"obj":"16646834"},{"id":"21635767-16254242-10069422","span":{"begin":11641,"end":11643},"obj":"16254242"},{"id":"21635767-19797678-10069423","span":{"begin":11820,"end":11822},"obj":"19797678"},{"id":"21635767-19797678-10069424","span":{"begin":12111,"end":12113},"obj":"19797678"},{"id":"21635767-17251202-10069425","span":{"begin":12114,"end":12116},"obj":"17251202"},{"id":"21635767-18689444-10069426","span":{"begin":12455,"end":12457},"obj":"18689444"},{"id":"21635767-12805597-10069427","span":{"begin":12458,"end":12460},"obj":"12805597"},{"id":"21635767-19889879-10069428","span":{"begin":12800,"end":12802},"obj":"19889879"},{"id":"21635767-9548986-10069429","span":{"begin":13488,"end":13490},"obj":"9548986"},{"id":"21635767-12023882-10069430","span":{"begin":13557,"end":13559},"obj":"12023882"},{"id":"21635767-10444089-10069431","span":{"begin":13614,"end":13616},"obj":"10444089"},{"id":"21635767-18703491-10069432","span":{"begin":13806,"end":13808},"obj":"18703491"},{"id":"21635767-7640528-10069433","span":{"begin":15271,"end":15273},"obj":"7640528"},{"id":"21635767-7640528-10069434","span":{"begin":15347,"end":15349},"obj":"7640528"},{"id":"21635767-16798944-10069435","span":{"begin":15700,"end":15702},"obj":"16798944"},{"id":"21635767-12231682-10069436","span":{"begin":16077,"end":16079},"obj":"12231682"},{"id":"21635767-12231845-10069437","span":{"begin":16080,"end":16082},"obj":"12231845"},{"id":"21635767-12805597-10069438","span":{"begin":17560,"end":17562},"obj":"12805597"},{"id":"21635767-19797678-10069439","span":{"begin":18818,"end":18820},"obj":"19797678"},{"id":"21635767-19155348-10069440","span":{"begin":19528,"end":19530},"obj":"19155348"},{"id":"21635767-10570159-10069441","span":{"begin":19550,"end":19552},"obj":"10570159"},{"id":"21635767-11573014-10069442","span":{"begin":19763,"end":19765},"obj":"11573014"},{"id":"21635767-16107256-10069443","span":{"begin":19766,"end":19768},"obj":"16107256"},{"id":"21635767-19010711-10069444","span":{"begin":19769,"end":19771},"obj":"19010711"},{"id":"21635767-16492731-10069445","span":{"begin":19992,"end":19994},"obj":"16492731"},{"id":"21635767-17986450-10069446","span":{"begin":20113,"end":20115},"obj":"17986450"},{"id":"21635767-19207209-10069447","span":{"begin":20862,"end":20864},"obj":"19207209"},{"id":"21635767-15500472-10069447","span":{"begin":20862,"end":20864},"obj":"15500472"},{"id":"21635767-19594710-10069447","span":{"begin":20862,"end":20864},"obj":"19594710"},{"id":"21635767-10760247-10069448","span":{"begin":20907,"end":20909},"obj":"10760247"},{"id":"21635767-18495751-10069449","span":{"begin":21861,"end":21863},"obj":"18495751"},{"id":"21635767-19594710-10069450","span":{"begin":21973,"end":21975},"obj":"19594710"},{"id":"21635767-16888323-10069451","span":{"begin":21976,"end":21978},"obj":"16888323"},{"id":"21635767-16888323-10069452","span":{"begin":22195,"end":22197},"obj":"16888323"},{"id":"21635767-16236422-10069453","span":{"begin":22198,"end":22200},"obj":"16236422"},{"id":"21635767-12176838-10069454","span":{"begin":22229,"end":22231},"obj":"12176838"},{"id":"21635767-18296465-10069455","span":{"begin":22247,"end":22249},"obj":"18296465"},{"id":"21635767-19594710-10069456","span":{"begin":22514,"end":22516},"obj":"19594710"},{"id":"21635767-16888323-10069457","span":{"begin":22517,"end":22519},"obj":"16888323"},{"id":"21635767-18006571-10069458","span":{"begin":22546,"end":22548},"obj":"18006571"},{"id":"21635767-19594710-10069459","span":{"begin":23990,"end":23992},"obj":"19594710"},{"id":"21635767-9477573-10069460","span":{"begin":24263,"end":24265},"obj":"9477573"},{"id":"21635767-16284311-10069461","span":{"begin":24359,"end":24361},"obj":"16284311"},{"id":"21635767-18463635-10069462","span":{"begin":24835,"end":24837},"obj":"18463635"},{"id":"21635767-16888323-10069463","span":{"begin":26956,"end":26958},"obj":"16888323"}],"text":"Results and Discussion\n\nAssociation of seed transcriptome with embryo morphology in developing Arabidopsis seeds\nUsing the raw intensity data generated by AtGenExpress for a global gene expression atlas throughout the Arabidopsis life cycle [35], we performed a detailed analysis of gene expression pertaining to seed storage reserve accumulation during the eight stages of seed development, ranging from globular embryo to mature embryo stages (Table 1). Of the nearly 24,000 genes represented on the Affymetrix GeneChip ATH1 genome array, we estimated that approximately 12,353 genes (or ~54%) were expressed in at least one of the eight development stages. Our analysis took into account the fact that certain genes might be transiently expressed at only one stage during seed development. The relatively high log2 intensity value of 6.0 was chosen as the threshold to focus on the genes with at least a modest level of expression. The global transcriptional activity in the developing Arabidopsis seed is higher than in the leaf, lower than in the flower, and comparable to that in the apex, root or stem (data not shown).\nTable 1 Arabidopsis developing seed samples used for AtGenExpress microarray experiments. The descriptions of samples used for microarray experiments in AtGenExpress [35] were obtained from the TAIR where the raw data files were retrieved [71]. The development stages for these seed samples range from four to 12 days after pollination, encompassing the accumulation phase of both oils and storage proteins [1]. To examine the overall transcriptome changes across the eight seed development stages, we performed a principal component analysis (PCA) in the 'Sample' space, and the results indicated that the global transcriptional program changes constantly during seed maturation (Figure 1). In PCA, the first principal component (i.e., development stage) was estimated to explain ~83% of variance in the seed transcriptome, indicating that embryogenesis is the predominant cause for the substantial variation observed in the transcript population. The differences in the global gene expression patterns among the eight developing stages were cross validated by a global association test [51], showing that the seed transcriptome varied across the eight developmental stages in a statistically significant manner (P \u003c 0.0001). The presence of siliques in the young seeds (S3 to S5; Table 1) may have had an effect on global transcript profiles in the seeds of earlier development stages, but its minor effect cannot be dissected from that of seed development under the experimental design in [35]. Additionally, Figure 1 shows that each stage has a distinct transcriptome signature that generally corresponds to its seed development stage defined by the embryo's morphology. For instance, as shown in Figure 1, the globular embryo stage (with three replicates) grouped tightly, the two samples from the bilateral stage clustered together but separately from other stages, and in general, samples from the expanded cotyledon stage and the mature embryo stage also clustered corresponding to their morphological stages, respectively. The transcriptome signature for one expanded cotyledon stage (with an asterisk in Figure 1), however, was closer to the two samples of the mature cotyledon stage, rather than the expanded cotyledon stage defined by embryo morphology. This result suggests that staging of seed development based on the embryo's morphological shape alone may not necessarily reflect the transcriptome state in the seed, which is attributable to the fact that molecular events, such as gene expression, occur prior to morphological changes. Consistent with the highly dynamic landscape in global gene expression, our analysis on individual genes using the method in [52] indicated that nearly all the genes expressed in developing Arabidopsis seeds are differentially transcribed under a stringent false discovery rate (FDR) threshold of 0.01 (data not shown). This lack of stably expressed genes with adequate transcript abundance brings into focus the challenge of determining reference genes that can be used for normalization in quantifying mRNAs in developing seeds [53]. In summary, this analysis demonstrates that the transcriptional program is subject to constant alterations during seed development as many other studies have shown, suggesting its tight regulation at the transcriptional level.\nFigure 1 The transcriptome dynamics during Arabidopsis seed development. The normalized, log2-transformed expression data for the 24 samples were subjected to principal component analysis (PCA) using the R prcomp function [83]. PC1 and PC2 are the first two principal components in the dataset. Different symbols and colours shown at the bottom of the figure were used for different seed developmental stages to show the relationship between molecular and morphological phenotypes. As in Table 1, the different samples are as follows: S3: C globular stage; S4: D bilateral stage; S5: D bilateral stage; S6: E expanded cotyledon stage; S7: E expanded cotyledon stage; S8: E expanded cotyledon stage; S9: F mature embryo stage; S10: F mature embryo stage.\n\nConstruction of gene coexpression networks in the Arabidopsis seed transcriptome\nTo infer the gene coexpression network in the transcriptome of developing Arabidopsis seeds, we focused on the 12,353 genes with moderate or high expression levels. The Pearson-based correlation coefficient was used as a measure of expression coherence, and we applied a correlation threshold of 0.90 and retained over 1.7 million correlated gene pairs representing 11,698 distinct genes. The resulting coexpression networks encompassed approximately 95% of seed- expressed genes, indicating that the majority of expressed genes in Arabidopsis seeds act in a concerted manner. We chose such a stringent correlation threshold considering the relatively small sample size in the analysis so that gene pairs in the coexpression network are statistically significant (P = 0.0005 using Fisher's Z transformation), meaning the probability of randomly obtaining a correlation coefficient of ≥ 0.90 in this seed transcriptome dataset is small. The frequency distribution of the number of connections is shown in Figure 2. Nayak et al. [40] used the absolute correlation (|r|) to construct a gene coexpression network in human immortalized B cells, but we believe that positive and negative correlations in gene expression may indicate different biological interactions (synergistic or antagonistic), and therefore we only included gene pairs with positive correlation coefficients above the threshold for the coexpression analysis. Nevertheless, gene pairs consistently expressed in a negatively correlated manner can also be of great interest to biologists.\nFigure 2 Summary of the gene coexpression network in developing Arabidopsis seeds. Distribution of the number of genes in different bins of edge numbers in the coexpression network of seed-expressed genes. The edge numbers were divided into different ranges and the frequency of nodes in each bin was found to summarize the coexpression network. The bin categories are as follows: 1; 2 - 49; 50 - 99; 100 - 149; 150 - 199; 200 - 249; 250 - 299; and ≥300. We also used a complementary clustering approach to identify gene clusters with similar expression profiles during seed maturation (Figure 3). We found six clusters could sufficiently represent the distinct patterns inherent in this seed transcriptome dataset, with some clusters being the \"mirror images\" of others. The first two clusters included the majority of genes related to the accumulation of seed storage reserves, which will be described in more detail below. It is important to point out that the method for identifying coexpression networks is computationally similar to various clustering approaches, using correlation coefficient (r) as the similarity measure, or alternately 1 - |r| as the distance measure. An important difference exists, however, in the parameters used in the two processes: the number of clusters is often specified in clustering although certain assessment can be performed beforehand, whereas the correlation threshold is chosen in the coexpression network analysis. We believe our approach of coexpression network identification, coupled with clustering, is advantageous for identification of genes in the same coexpression cluster with visible expression patterns during seed maturation, enabling easier biological interpretation and various complementary analyses.\nFigure 3 Fuzzy clustering of the expression data along seed development series. The six clusters showing the expression patterns during Arabidopsis seed development. The gene expression values were standardized to have a mean value of zero and a standard deviation of one for each gene profile. The transformed expressions were then clustered using the fuzzy c-means (FCM) clustering algorithm implemented in the Bioconductor Mfuzz package [89]. Based on preliminary analysis, we found six clusters can well represent different expression patterns inherent in the dataset, and another FCM parameter m = 1.75. A membership value in the range of 0-1 was assigned in clustering and the cluster cores consisting of genes with membership value \u003e = 0.90 were coloured pink. Several parameters can be used to describe a biological network, including the clustering coefficient and scale-free topoplogy criterion. The scale-free topology criterion ranges from zero to one for typical biological networks under investigation [54-56]. The clustering coefficient and scale-free topology criterion were 0.73 and 0.68, respectively, in this Arabidopsis seed coexpression network (Table 2), indicating topological similarity to other biological networks. As shown in Figure 2, the network is comprised of many genes with few links (e.g., most genes have two to 100 putative coexpression partners) but relatively few genes with many connections, which is consistent with the power-law distribution widely present in biological networks. In the coexpression network, each gene has a median of 71 edges. It is notable that a relatively large number of genes have ≥300 edges (Figure 2), which is at least partly due to this larger range containing all remaining numbers of connections. We observed the edge numbers for genes in different Gene Ontology (GO) 'Biological Process' categories and did not find any association between the number of coexpression partners and obvious functional significance (data not shown); TF gene LEC1 and a ribosomal protein S18 gene (RPS18), for instance, were found to connect with 38 and 178 coexpression partners, respectively. This indicates that, while the number of edges for a node may suggest the functional significance of the gene, the centrality (or location) of a node in the network can be more important. This aspect has been well described in social network analysis [57].\nTable 2 Network characteristics in the Arabidopsis seed coexpression network. Network properties were analysed according to the methods in [40]. a The clustering coefficient measures the \"small-world\" property in the network, which is the likelihood that two genes connected to a common gene are also connected to each other [54]. b The scale-free topology criterion is used to measure the topological similarity of a network to other biological networks. Its value ranges from 0 to 1, with 1 representing networks that are most like other biological networks [55]. c Gamma is the measurement of power-law distribution in a network [56], which consists of many genes with relatively few connections and a few genes (hubs) with many connections; a gamma \u003c 3 indicates that a network exhibits such a distribution [40].\n\nGenes encoding fatty acid biosynthetic genes and seed storage reserve associated proteins are located in different subnetworks\nWhile the entire coexpression network is useful for network topology analysis, isolation of a subnetwork (or cluster) makes it more accessible to biologists [40,58]. More importantly, a subnetwork in the large coexpression network is often more biologically relevant in a pathway context. Hence, we extracted subnetworks from this gene coexpression network for genes relevant to the accumulation of seed storage reserves (Figure 4). Of the 48 genes known to encode enzymes involved in FA biosynthesis [17,59], we identified 44 (or ~92%) genes represented on the ATH1 array, and all of them were found in one subnetwork (Figure 4A). This subnetwork cluster consists of 1854 genes (Additional File 1), which is in general agreement with an interactive correlation network generated genome-wide in Arabidopsis using a heuristic clustering algorithm [41]. Such a gene list can be used to identify interactors of genes in FA synthesis in developing seeds. Consistent with the coexpression subnetwork analysis, the majority of genes involved in FA biosynthesis were associated with Cluster 1 (Figure 3). Their expression levels increased steadily from the globular embryo stage, generally reached the peak at the expanded cotyledon stage, and dramatically declined subsequently throughout late seed maturation (Figure 4B). Such a unified expression pattern for most FA biosynthetic genes supports earlier studies showing that FA supply can be a limiting factor for triacylglycerol (TAG) accumulation in developing embryos of Brassica napus [60], olive (Olea europaea L.) and oil palm (Elaeis guineensis Jacq.)[61], as well as cuphea lanceolata and other oil species [62]. Recent studies of metabolic flux in developing embryos of B. napus, however, indicated that TAG assembly was more limiting than FA biosynthesis in regulating the flow of carbon into TAG [63]. The majority of genes encoding oilbody oleosins and SSPs were found in another subnetwork with a distinct expression pattern (Figure 4C). The subnetwork encompassing genes encoding oleosins and SSPs is comprised of 1392 genes (Additional File 2). Genes encoding oleosins and SSPs were in Cluster 2 (Figure 3), and their expression profiles were strikingly similar. These genes were virtually unexpressed at the globular stage, increased rapidly (\u003e1000-fold in many cases) from the globular stage to the bilaternal stage, and remained at the elevated expression level throughout the remaining stages of seed maturation (Figure 4D). Transcripts for OLEOSIN and SSP genes are most abundant in the seed transcriptome late during seed development. In contrast, most genes in the TAG assembly pathway were found in different subnetworks, exhibiting various expression profiles during seed development (Figure 5). DIACYLGLYCEROL ACYLTRANSFERASE 1 (DGAT1), FATTY ACID DESATURASE 2 (FAD2), FATTY ACID ELONGASE 1 (FAE1) and STEAROYL DESATURASE (SAD) genes were identified in this subnetwork, albeit expressed at substantially lower levels compared to genes encoding oleosins and SSPs (Additional File 3). DGAT catalyzes the acyl-CoA-dependent acylation of sn-1,2-diacylglycerol to produce TAG and CoA [64]. FAD2 catalyzes the introduction of a second double bond into acyl groups in phospholipid whereas SAD catalyzes the formation of monounsaturated FA in the plastid [65]. FAE1 catalyzes the elongation oleoyl-CoA in the endoplasmic reticulum [65]. Our analysis determined that AT1G48300, which was named DGAT3, is the putative gene encoding a cytosolic DGAT in Arabidopsis. The amino acid sequence of AT1G48300 has a significantly high degree of similarity (expect value \u003c 1 × 10-21) to the soluble DGAT in peanut (Arachis hypogaea), where the cytosolic DGAT gene in plants was first discovered [66]. Notably, DGAT3 exhibited a similar expression pattern with DGAT1, but expressed higher during late seed maturation. In earlier studies, quantification of DGAT activity during seed maturation in B. napus indicated that enzyme activity was maximal during the rapid phase of oil accumulation with a substantial decrease in activity occurring as oil levels reached a plateau [67,68]. Assuming DGAT activity shows a similar profile during seed development in Arabidopsis, this suggests that DGAT may be down-regulated post-transcriptionally and/or post-translationally during the latter stages of seed development.\nFigure 4 Subnetwork and temporal expression profiles for genes involved in seed storage reserve accumulation in developing Arabidopsis seeds. A is the subnetwork for genes including those in fatty acid (FA) biosynthesis, and B depicts the expression profiles of FA biosynthetic genes identified in the analysis. C is another subnetork including genes encoding oleosins and seed storage proteins (SSP), and D depicts the expression profiles of genes encoding oleosin and SSP. In B and D, the expression values, AGI identifiers of the genes depicted are listed in Additional File 3, and the log2 expression values were standardised by subtracting the value at the first S3 stage for each gene. Dashed red, blue lines indicate 2-fold up- or down-regulation, respectively.\nFigure 5 Expression profiles of genes including homologues in the triacylglycerol assembly pathway. The dash line at 6.0 is often used as the cutoff for present (expressed; above the line) or absent (unexpressed; below the line). All expression data were transformed to the log2 scale for plotting the profiles. Genes and homologs in the triacylglycerol (TAG) assembly pathway were identified based on an early survey of Arabidopsis genes involved in acyl lipid metabolism [59], and their AGI identifiers listed in Additional File 3. Refer to [64] for their roles in TAG assembly. The abbreviations of these genes and their encoded enzymes (EC numbers) are as follows: GPAT: sn-glycerol-3-phospahte acyltransferase (EC 2.3.1.15); LPAAT: lysophosphatidic acid acyltransferase (EC 2.3.1.15); PAP: PA phosphatase (EC 3.1.3.4), including LIPIN (PAP1) and LPP (PAP2); AAPT: Aminoalcoholphosphotransferases (EC 2.7.8.1 and EC 2.7.8.2); CPT: cytidine diphosphate (CDP)-choline: 1, 2-diacylglycerol cholinephosphotransferase (EC 2.7.8.2); LPCAT: lysophosphatidylcholine acyltransferase (EC 2.3.1.23); PLA2: Phospholipase A2 (EC 3.1.1.4); PDAT: phospholipid:diacylglycerol acyltransferase (EC 2.3.1.158); LCAT: lechitin:cholesterol acyltransferase (EC 2.3.1.43), these three shown here are PDAT homologs; DGAT: Diacylglycerol acyltransferase (E.C. 2.3.1.20). In summary, our new results suggest that genes acting in a biological process (FA biosynthesis) can be indicated by their presence in the same coexpression network cluster, but genes involved in the same pathway (TAG assembly) may not necessarily exhibit expression coherence. As a result, computational approaches using coexpression network to predict gene function, such as in [40], will undoubtedly have limitations.\n\nPutative regulatory elements underlying seed storage reserve accumulation\nTo gain insight into possible relationships in gene coexpression and regulation, we first identified the expression patterns for several TFs known to regulate the accumulation of seed storage reserves (Figure 6). AGL15 (AGAMOUS-LIKE 15), GL2 (GLABRA2), LEC1, L1L, and WRI1 exhibited similar expression patterns with most genes encoding proteins involved in FA biosynthesis (Figure 6A) whereas ABI3, EEL, and FUS3 all have similar expression profiles with genes encoding oleosins and SSPs (Figure 6B). Two repressors of seed maturation genes, ASIL1 (ARABIDOPSIS 6B-INTERACTING PROTEIN 1-LIKE 1) [69] and PICKLE (PKL) [70], were modestly expressed and exhibited a stable expression pattern throughout seed maturation (Figure 6C). Surprisingly, LEC2, a TF gene known to regulate oil accumulation in leaves and somatic embryogenesis [10,14,16], was barely detectable in these developing seeds. Although this result requires verification with other molecular methods, it was previously reported that LEC2 might be only expressed during early embryo morphogenesis [15]. Additionally, based on phenotype descriptions of LEC1, LEC2 mutants in the Arabidopsis Information Resource (TAIR) [71], the accumulation of storage compounds in the mature lec2 mutant seeds is only slightly defective when compared to lec1 mutant seeds. Therefore, the role of LEC2 as a regulator in the synthesis of seed storage reserves during late stages of zygotic embryo development might not be as important as currently thought. Likewise, ABI4 was also essentially unexpressed in these seed samples. The expression similarity between genes encoding TFs and their target genes is suggestive of regulatory relationships. Both LEC1 and WRI1 were clustered with most FA biosynthetic genes, while ABI5 was clustered with the majority of LATE-EMBRYOGENESIS ABUNDANT (LEA) genes (Figure 3 Cluster 3). LEC1 and WRI1 are known to regulate many FA biosynthetic genes [25-27], and ABI5 regulates a subset of LEAs [72].\nFigure 6 Expression profiles of several well-characterized transcription factor genes. The dashed line at 6.0 on the y-axis is often used as the cutoff for present (expressed; above the line) or absent (unexpressed; below the line). All expression data were transformed to the log2 scale for plotting the profiles. Refer to Additional File 3 for their AGI identifiers and full name of each transcription factor gene. The gene abbreviations are as follows: AGL5: AGAMOUS-LIKE 5; GL2: GLABRA 2; LEC1: LEAFY COTELYDON 1; L1L: LEAFY COTELYDON 1 LIKE; WRI1: WRINKLED 1; ABI3: ABSCISIC ACID-INSENSITIVE 3; EEL: ENHANCED EM (EMBRYO MORPHOGENESIS) LEVEL; FUS3: FUSCA 3; ABI5: ABSCISIC ACID-INSENSITIVE 5; PKL: PICKLE; ASIL1: ARABIDOPSIS 6B-INTERACTING PROTEIN 1-LIKE 1. To computationally identify cis-acting regulatory elements, the upstream promoter sequences for the genes involved in storage reserve biosynthesis were extracted from the RSAT server [73]. We included some 5'-UTR sequences as certain TF binding sites can be located within this region of a gene [27,74]. On average, the G-C content in the promoter sequences of the gene set was found to be \u003c35%, which is consistent with the compositional bias of nucleotides towards A-T enrichment observed in plant promoter regions [74,75]. Two software tools, TFBS [76] and fdrMotif [77], were used to search for putative TF-binding sites on both strands. Both tools depend on TF- binding profiles (Position Weight Matrix, or PWM) derived from experimentally determined binding sites for the prediction, we thus compiled 118 PWMs from the literature [27,74] and the JASPAR database [78] (Additional File 4). In the JASPAR database, we only considered the binding profiles for plant-specific TFs because of their potential critical roles in regulating the accumulation of storage reserves during seed development, a unique physiological process in higher plants.\nWe predicted a total of 1770 binding motifs in the promoter regions of genes involved in FA biosyntheis, TAG assembly, and genes encoding oleosins and seed storage proteins (Additional File 5). Each TF can have more than one putative binding site in each gene. As our approach of using two predictive tools already filtered out a large number of potentially false predictions, the remaining number of putative motifs was relatively small, making it difficult to perform statistical analysis of motif enrichment. Therefore, we used a simple approach to determine overrepresentation of a TF binding motif in the gene set, and defined the number of the motifs for a particular TF as overrepresented if it is greater than the sum of the average plus the standard deviation of all predicted motifs in a gene set. Sequence logos are used to show the degree of conservation, indicated by the height of each nucleotide, at each position (Table 3). For the Aw-box motif interacting with WRI1, which possesses a sequence pattern of [CnTnG](n)7[CG] (where n is any nucleotide), we predicted binding sites in 26 of 44 FA genes identified, seven more than reported recently in [27]. The highly conserved CCAAT motifs for LEC1 (and L1L) binding are significantly enriched in promoters of all FA biosynthetic genes identified. Motifs that interact with TF genes known to regulate light-induced genes, such as Zinc-finger proteins DOF1 (MNB1A) and DOF2 [79], as well as GATA TFs and SORLIP 5 (Sequences Over-Represented in Light-Induced Promoter 5) [80], are overrepresented in the promoters of FA biosynthetic genes. DOF1 (AT1G51700), however, was expressed only at the early globular embryo stage. DOF2 (AT4G38000) exhibited a similar expression profile during seed development as for FA biosynthetic genes (data not shown). ARR (Arabidopsis Response Regulator) genes encode ARR7 and ARR15, which have been shown to regulate the interaction of cytokinin and auxin in root stem-cell specification during early embryogenesis [81]. We found no binding matrices for these two regulators, but the binding matrix for ARR10 is present in our compiled matrix set and ARR10 motifs are overrepresented in the promoter regions of FA biosynthetic genes. We also found no binding matrices for AGL 5 or GL2; binding profiles for AGL 3 and AGL 15 were present in our analysis but no enriched motifs were identified in the promoter sequences of these FA biosynthetic genes.\nTable 3 Overrepresented motifs identified in promoters of genes involved in fatty acid synthesis, and oleosin and seed storage protein accumulation. Refer to Additional File 4 for each matrix identifier (ID). The abbreviations of transcription factors (TF), promoter element, and pathway names are as follows: WRI1: WRINLKED 1; DOF: DNA binding with One Finger; SORLIP: Sequences Over-Represented in Light-Induced Promoter; CBF: CCAAT binding factor; bZIP: basic leucine zipper; FA: fatty acid; SSP: seed storage protein. For the genes and isoforms in the TAG assembly pathway, no overrepresented motifs have been found. Our goal was to identify putative promoter elements that can be used for experimental studies (Additional File 5). Interestingly, promoter motifs for B3 domain TFs, such as ABI3, FUS3 and LEC2, were found to be overrepresented in promoters of genes encoding oleosins and SSPs. Motifs for bZIP factors (e.g., bZIP67) also appeared to be overrepresented in the promoter regions of these genes, but there were no binding matrices for bZIP ABI5 or EEL.\nOur approach of computational promoter analysis was limited by the availability of experimentally determined TF-binding sites for deriving binding profiles of additional TFs. We compiled a list of 118 binding matrices for this analysis, but if binding profiles for other TFs can be generated from a reasonable number of known binding sites, we could identify more TFs that possibly regulate the accumulation of seed storage reserves. In addition, we only considered upstream sequences of 1000 bp plus 200 bp 5'-UTR for each gene, because the majority of cis-acting regulatory elements are located in this region [74]. Other genomic regions including the 3'-UTR, or even introns, however, can also harbour TF binding sites.\n"}