> top > docs > PMC:4979053 > spans > 32674-32676

PMC:4979053 / 32674-32676 JSONTXT

Assessing Agreement between miRNA Microarray Platforms Abstract Over the last few years, miRNA microarray platforms have provided great insights into the biological mechanisms underlying the onset and development of several diseases. However, only a few studies have evaluated the concordance between different microarray platforms using methods that took into account measurement error in the data. In this work, we propose the use of a modified version of the Bland–Altman plot to assess agreement between microarray platforms. To this aim, two samples, one renal tumor cell line and a pool of 20 different human normal tissues, were profiled using three different miRNA platforms (Affymetrix, Agilent, Illumina) on triplicate arrays. Intra-platform reliability was assessed by calculating pair-wise concordance correlation coefficients (CCC) between technical replicates and overall concordance correlation coefficient (OCCC) with bootstrap percentile confidence intervals, which revealed moderate-to-good repeatability of all platforms for both samples. Modified Bland–Altman analysis revealed good patterns of concordance for Agilent and Illumina, whereas Affymetrix showed poor-to-moderate agreement for both samples considered. The proposed method is useful to assess agreement between array platforms by modifying the original Bland–Altman plot to let it account for measurement error and bias correction and can be used to assess patterns of concordance between other kinds of arrays other than miRNA microarrays. 1. Introduction MiRNAs are small non-coding RNA molecules that have been shown to play a critical role in tumorigenesis [1,2,3,4] and in several other pathologies [5,6,7,8]. In order to measure miRNA intensity levels, several methods, such as RT-qPCR, high-throughput sequencing and microarrays, have been developed and have enabled researchers to profile a large number of miRNAs simultaneously across different experimental conditions [9]. MiRNA microarrays, in particular, since their first appearance in 2004 [10], have known a considerable expansion in life sciences and are now routinely used in biomolecular research. In the last few years, miRNA microarrays have been compared with next generation sequencing technologies to study their performances [11,12]. The main interest lies in the potentialities and advantages offered by these new platforms in terms, for instance, of new miRNAs discovery [13]. However, since results from these comparison studies appear to be contrasting, miRNA microarrays still remain an effective and useful technology, whose characteristics need to be properly assessed, both in terms of within-platform reliability and of between-platforms agreement. To date, only a few studies have attempted to evaluate within-platform [14] and between-platform [15,16] reliability in miRNA microarrays. Results were reported mainly as correlation coefficients (both Pearson and Spearman) for evaluating both intra-platform and inter-platform performance, calculated on a subset of miRNA that depended on detection calls concordant between platforms (i.e., miRNA that were called “detected/present” on all platforms considered for the analysis). Additionally, Sato and colleagues assessed between-platform comparability also in terms of miRNAs that were commonly differentially expressed between samples for all platforms [15], whereas Yauk et al. [16] evaluated within-platform reproducibility via Lin’s concordance correlation coefficient [17]. In this work, a different approach based on the Bland–Altman method to assess between-platform agreement is proposed. In particular, the proposed method is applied to assess agreement between three different miRNA microarray platforms (Affymetrix, Agilent, Illumina). Additionally, advice against the use of Pearson/Spearman correlation coefficients is provided, and use of concordance correlation coefficients (pairwise and overall) is suggested as a better measure to evaluate within-platform reliability. 2. Experimental Section 2.1. Samples The two samples involved in the study are a renal tumor cell line named A498 (ATCC, Manassas, VA, USA) [18] and a pool of twenty different human normal tissues (namely, hREF), obtained from the First Choice© Human Total RNA Survey Panel, (Ambion Inc, Austin, TX, USA). RNA material was analyzed in different laboratories, as follows: Affymetrix processing took place at the Biomedical Technologies Institute of the University of Milan (Segrate, Italy); Illumina and Agilent processing were performed at the Department of Experimental Oncology of the National Cancer Institute (Milan, Italy). For both samples, three technical replicates for each platform were performed, leading to a total of 18 arrays, six for each different microarray platform. 2.2. Sample Preparation and Hybridization The cell lines were cultured at cell confluence according to the corresponding ATCC datasheets. Total RNA samples were extracted using the miRNeasy kit (Qiagen Spa, Hilden, Germany) and quantified by ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). Using an Agilent 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA), RNA integrity was assessed on the basis of the RIN (RNA integrity number) factor, and the presence of low molecular weight RNA (5S) was verified. Labeling of total RNA samples was performed using the FlashTagTM Biotin RNA labeling Kit (Genisphere Inc., Hatfield, PA, USA) starting from 1 μg of total RNA. Briefly, the tailing reaction is followed by ligation of the biotinylated signal molecule to the target RNA sample. The labeling reaction is based on Genisphere proprietary 3DNA dendrimer signal amplification technology. Prior to array hybridization, total RNA labeling was verified using the enzyme-linked oligosorbent assay (ELOSA). After that, biotin-labeled samples were hybridized onto the arrays at 48°C, then washed and stained using the Affymetrix Fluidics Station 450. Arrays were scanned with the GeneChip© Scanner 3000 7G to acquire fluorescent images of each array and analyzed by use of GeneChip Operating Software (GCOS, version 1.2). 2.3. Data Pre-Processing 2.3.1. Affymetrix GeneChip© miRNA Array Intensity .CEL files were obtained from the scan images and imported to the Affymetrix© miRNA QC Tool software (Version 1.0.33.0) to quantify the signal value. Quality control (QC) was assessed by plotting the average intensity of the oligo spike-in and background probe sets (included in the control target content) across all of the arrays. According to Genisphere, oligo spike-in 2, 23, 29, 31 and 36 probe sets should present a value of more than 1000 intensity units to accept array quality. The miRNA arrays were detected using the Affymetrix detection algorithm, based on the non-parametric Wilcoxon rank-sum test, applied independently on each array and probe/probe set; a p-value greater than 0.06 stands for “not detected above background” [19]. For data normalization, the “default” method was used, obtaining log2 expression values (expression values data matrix) from the raw data (intensity values data matrix). Briefly, this method involved the following three steps: grouping the background probes intensities based on GCcontent, where the median intensity of each bin was the correction value for each probe with the same GC content; a quantile normalization and, finally, a median polish summarization. To obtain a single intensity value for each miRNA mapped on the log2 array, intensity measures for replicated spots were averaged. 2.3.2. Agilent Human miRNA Microarray (V1) Images were scanned using the Agilent Feature Extraction (AFE) software (version 11.0.1.1), obtaining the total gene signal (TGS) for all miRNAs on the array. Negative values were transformed by adding, for each array separately, the absolute value of the minimum TGS intensity on the array as extracted by the AFE + 2 before log2 transformation [20]. Data extracted from AFE were imported in the R environment [21] and processed using the AgimiRNA package, available in Bioconductor [22,23]. 2.3.3. Illumina HumanMI_V2 Raw data were processed using the proprietary BeadStudio software (Version 3.3.8). No background subtraction was performed. Probe-level data were summarized to obtain miRNA-level data, and then, a log2 transformation was applied. 2.3.4. miRNA Selection and Normalization The Affymetrix platform contained information on 7815 miRNAs, of which 847 (10.83%) were human, whereas Agilent platform content was of 961 miRNAs (851 human, 88.55%), and on the Illumina array 1145 miRNAs were detected, of which 858 (74.93%) were human miRNAs. Human miRNAs common to all platforms were selected according to their name and confirmed by a search on miRBase (Release 18, November 2011). Unlike other published works, miRNAs were not filtered on a detection basis, because such an approach could possibly introduce a bias in the results. In fact, some of the miRNAs that are filtered out because they are “switched-off” could show patterns of within- and/or between-platform disagreement in another experiment, where they are “turned-on”. This could possibly lead to an over-estimate of the level of reliability. Considering only human miRNAs should circumvent this issue and, at the same time, provide relevant information, since human miRNAs are commonly those that are of major interest in biomolecular investigation. Moreover, no data normalization was performed. Almost all works that focused on comparing microarray platforms normalized their data (for instance, [15,16,24]), but this is a non-trivial issue that has to be carefully evaluated. As a matter of fact, to date, normalization for miRNA microarray has been largely debated, with results that have been somehow discordant [25,26,27,28,29], so that no “gold-standard” methods exists. Additionally, normalizing data in the context of assessing platform agreement poses other relevant problems. If data on two different platforms are normalized and then compared, then there is no way to discriminate between platform and normalization on the results of concordance/agreement/reproducibility assessment. A high level of between-platforms agreement, due not to the platforms themselves, but to the normalization used, might be found. On the other hand, the same normalization on different platforms could highlight patterns of discordance that cannot be ascribed to the platforms. Nonetheless, comparing un-normalized data exposes the risk of finding poor concordance, because of incidental batch effects occurring in the experiment, which may lead to an underestimate of the “true” agreement between platforms. In this paper, we have chosen to use non normalized data, so that we could assess the performance of different platforms “per se”. For the sake of comparison, data were also normalized with the quantile and loess algorithm, and results were compared to those obtained on non-normalized data. 2.4. Statistical Analysis 2.4.1. Intra-Platform Reliability To assess the reliability of the three miRNA microarray platforms, pair-wise concordance correlation coefficients [17] were computed for all possible pairs of technical replicates for all platforms, within each sample. The CCC ρc between two series of n measurements x and y is defined as: (1) ρc=2σxyσx2+σy2+μx−μy2=2ρσxσyσx2+σy2+μx−μy2 where ρ=σxyσxσy=∑i=1n(xi−x¯)(yi−y¯)∑i=1n(xi−x¯)2∑i=1n(yi−y¯)2 is the Pearson correlation coefficient between x and y, μx=∑i=1nxin and μy=∑i=1nyin are the sample means of x and y and σx2=∑i=1n(xi−x¯)2n−1 and σy2=∑i=1n(yi−y¯)2n−1 are the sample variances of x and y. Unlike the correlation coefficients, which only can give information about the existence of a linear relationship between two measurement methods, the CCC provides information on both precision (best-fit line) and accuracy (how far the best-fit line deviates from the concordance line) and is thus a better measure to assess platform reliability [30]. Additionally, the pairwise CCCs were combined within each sample and platform into an overall measure of reliability, the overall concordance correlation coefficient (OCCC) [31], a weighted mean of pairwise CCCs, which is defined as follows: (2) ρc0=∑j=1J−1∑k=j+1Jξjkρcjk∑j=1J−1∑k=j+1Jξjk where ρcjk is the standard Lin’s CCC between j-th and k-th replicate measurement series (in this study, these are the replicate arrays), and ξjk are the weights, specific for each paired comparison:(3) ξjk=σj2+σk2+μj−μk2 Confidence intervals for the OCCC were computed using the bootstrap [32]. Specifically, 1000 bootstrap samples were extracted, and for each of these samples, sample means, variances, covariances, CCC and OCCC were computed. Then, using the empirical distribution of the bootstrap, estimates of the OCCC percentile confidence intervals at 95% were estimated. To evaluate whether pairs of technical replicates are actually in agreement, the non-inferiority approach proposed by Liao and colleagues [33] for gene expression microarrays was followed. This approach consists of defining a threshold, or lower-bound, ρc(CL) reflecting the minimal value that the CCC should assume to conclude that two methods agree and then testing the following hypothesis: (4) H0:ρc≤ρc(CL)vs.H1:ρc>ρc(CL) This can be done using the confidence intervals for both CCC and OCCC, interpreting the results as follows: if the lower confidence bound falls below ρc(CL), then the null hypothesis cannot be rejected and the two replicates cannot be said to be in agreement; otherwise, the two replicates are in agreement. To determine the value of ρCL, the authors define the minimum thresholds of precision and accuracy, and then, since the CCC can be seen as a product of a precision and accuracy term, ρc(CL) is computed as the product of these two thresholds. In their example, they propose a threshold of 0.90, yet in this paper, we have chosen to use the value of 0.96, according to the following formula: (5) ρc(CL)=2ρCLvCL+vCL−1+uCL2=2*0.980.9+0.9−1+0.152=0.9638≈0.96 where v=σ1/σ2 represents the scale shift between the two measurements series and u=(μ1−μ2)/σ1σ2 is the location shift relative to the scale. The reason for the choice of these values is subjective, but in this case, there has been the attempt to be conservative: a higher value for ρCL means a relationship between technical replicates as linear as possible, though leaving space for small departures due to ineffective probes or small experimental effects. On the other hand, increasing vCL to 0.9 is due to the fact that miRNA measurements are assumed to be less variable than gene expression, so that technical replicates may show very similar patterns of variability. Only uCL is unchanged, because the value proposed in [33] appeared reasonable also for miRNA microarrays. 2.4.2. Between-Platform Agreement In the microarray literature, concordance between platforms has been often studied using the correlation coefficient. Not only is this the wrong approach, but additionally, correlation coefficients are computed assuming that intensity/expression values do not suffer from any measurement error, thus leading to possible underestimates of the real level of correlation between platforms [34]. Here, agreement between platforms was evaluated using a modified version of the Bland–Altman approach. Such a modification, suggested by Liao et al. [35], allows one not only to assess whether two methods of measurement are concordant, but also to provide information on the eventual sources of disagreement. In a nutshell (greater details can be read in the original paper), the method involved the estimation, for each platform pair and separately for each sample, of a measurement error model, i.e., a model where also the independent variable(s) X were assumed to be affected by uncertainty, of the form:(6) Yi=a0+b0Xi0+ϵi (7) Xi=Xi0+δi where (Xi0,Yi0),i=1,...,n were the unobserved true values of the two measurement methods to be compared, i.e., miRNAs intensities on the two platforms, and ϵi and δi, were the i.i.d. error components of the model, which followed a normal distribution with the mean equal to 0 and variances equal to σϵ2 and σδ2, respectively. To estimate this model, the ratio λ of the error variances of Y and X had to be known, possibly by means of replication or, when replication is not feasible, by setting it equal to 1, thus assuming equal error variances for both methods. In this study, both strategies were evaluated, using the technical replicates to estimate λ by fitting a linear model with the factor “replicate” as the covariate. The estimated residual variance was then used as the sample error variance for the platform. Once the parameters of the model were estimated, assuming that Y−X∼N(a0+(b0−1)X0,1+λσδ2), modified versions of the agreement interval for Y−X proposed by Bland and Altman [36,37] were estimated according to the bias (fixed or proportional) needed to correct for when comparing two platforms, as follows: (a)  No bias: (a0 = 0, b0 = 1) (8) Δ=−t1−α/2,n−11+λσ^δ,+t1−α/2,n−11+λσ^δ(b)  Fixed bias: (a0≠ 0, b0 = 1) (9) Δ=a0−t1−α/2,n−11+λσ^δ,a0+t1−α/2,n−11+λσ^δ(c)  Proportional bias: (a0 = 0, b0≠ 1) (10) Δ=b0−1Xi−t1−α/2,n−11+λσ^δ,b0−1Xi+t1−α/2,n−11+λσ^δ(d)  Fixed and Proportional bias: ( a0≠ 0, b0≠ 1) (11) Δ=a0+b0−1Xi−t1−α/2,n−11+λσ^δ,a0+b0−1Xi+t1−α/2,n−11+λσ^δ where Xi were the actual measured values of the method. Including only the parameter a0 in the agreement interval meant that the two methods differ only by a “fixed” shift that did not depend on the value of Xi (thus, fixed bias). Including only (or also) b0, on the other hand, meant that the differences between the two methods increased proportionally with the increase of the value of measurement Xi according to the value of the parameter b0 itself (thus, proportional bias). Finally, let n be the number of subjects and 0ially relevant when Affymetrix and Illumina were compared (see Tables S8 and S12 in the Supplementary Material). The fact that differences between normalized and un-normalized data were more relevant when λ was assumed to be one could be due to the effect of the normalization procedure on the error variance ratio: in particular, when normalization is performed, there is a use of information carried by the data that can lead to a reduction in the residual error variance and in the ratio λ and, eventually, to more stable results and, thus, more concordant measurements. On the other hand, differences between platforms are already taken into account when λ is estimated, so that a normalization procedure could only limitedly improve results. microarrays-03-00302-t007_Table 7 Table 7 miRNA in agreement between arrays. Number (n) and proportion (%) of miRNAs lying in the different agreement intervals, estimated according to the measurement error model parameters estimated by setting λ = 1 and by estimating it via random effects models. Confidence intervals for the proportions were computed using the Clopper–Pearson exact method [39]. †: the platform pair is in agreement. Results for model parameter and λ estimation on normalized data are available in the Supplementary Material, from Tables S5 to S7 for quantile normalization and from S9 to S11 for loess normalization. 4. Conclusions Studies evaluating high-throughput platform reliability by means of correlation coefficients, both Pearson and Spearman [15], are often seen in the literature. Many of these studies claim that high correlation coefficients imply high reliability. This misuse of the correlation coefficients has already been outlined in the microarray literature [30]; however, several studies still use these measures to assess the repeatability or reproducibility of high-throughput platforms. In this study, both the issue of within-platform reliability and between-platform agreement of three miRNA microarray platforms were discussed. Results have highlighted that Agilent and Illumina were the platforms showing the best pattern of both reliability and agreement, whereas Affymetrix appeared to have some technical problem. In terms of reliability, the OCCC for line A498 was relevantly lower than that for line hREF for all platforms and for all sets of miRNAs considered (common human miRNAs, human miRNAs array-wise, all miRNAs). Overall, Illumina (OCCC from 0.989 to 0.994) showed better performances with respect to Agilent (from 0.975 to 0.994) and, in particular, Affymetrix (from 0.927 to 0.993), and this was better seen for line A498. Since this line suffered from technical issues on Affymetrix and, partially, also on Agilent, the related OCCC possibly underestimated the “true” degree of reproducibility of the platforms, which could be better appreciated with line hREF, where it ranged from 0.992 (CI 95%: 0.990–0.993) to 0.993 (CI 95%: 0.993–0.994), with values almost identical to those of Agilent and Illumina, thus suggesting also in this case a good performance of the array. These results, however, only referred to the “repeatability” of the assay, meaning that technical replicates were performed under “conditions where independent test results are obtained with the same method on identical test items in the same laboratory by the same operator using the same equipment within short intervals of time”. Conversely, reproducibility implies “conditions where test results are obtained with the same method on identical test items in different laboratories by different operators using different equipment”, as defined by the ISO [40]. This specification is relevant, since studies claiming to have evaluated the reproducibility of arrays actually evaluated only their repeatability [15], whereas reproducibility assessment necessarily requires the involvement of multiple laboratories, as in the MAQCProject [41]. Assessing agreement between different methods of measurement is a task that has rarely been addressed in the microarray literature, where the focus has always been more on the evaluation of a linear relationship between platforms or with “gold-standard” assays, such as qPCR, mainly via computation of correlation coefficients. Such an approach has often been biased by the selection of miRNAs used to do the computations, in that often, only miRNAs concordantly detectable between platforms [15,16] were chosen, possibly leading to an overestimate of the real level of correlation between arrays. To avoid this issue, human miRNAs common (i.e., matched by name) on all of the platforms considered for the experiment were used. By setting a “strong” threshold at 95%, equivalent to 772 miRNAs, it was found that Agilent and Illumina arrays were concordant according to the agreement interval derived from the estimation of the measurement error model, irrespective of the method for choosing the value of λ. The choice of this threshold was subjective and should depend on the issue at hand; the present choice was based on the fact that it is likely that a few miRNAs exist that do not agree because of unwanted technical issues not attributable to the platform itself, but also to the need to reduce the number of false positives (i.e., falsely concordant platforms). This last point is crucial in the field of microarrays, since the comparability of results from different platforms as a tool for validating a laboratory’s own results has gained much relevance in omic research. As a matter of fact, the basic assumption is that the two platforms are “linked” in that what they convey about the profile of intensity/expression for a sample is similar, the net of the different measurement and the analytic scale of the platform itself. There is, however, a possible confounding effect in these results that was not possible to control, i.e., the site where the arrays were processed. As described in the Experimental Section, Affymetrix arrays were processed at a laboratory (say LAB1), Agilent and Illumina at another one (say LAB2), so that when the comparison between Affymetrix and the other platforms is done, different laboratories and different platforms are compared in a completely confounded way. In this perspective, it is not clear to what extent the laboratory effect encompasses the platform effect, i.e., how much the differences between Affymetrix and the other platforms were due to different sites of processing and how much to actual assay differences. Similarly, it could not be exclude that the high degree of concordance between Illumina and Agilent could be sharpened by this confounding effect. One alternative could have been to have all of the arrays processed at a single site, so that the shared laboratory effect would cancel out, yet this was not possible, for the facilities involved in the experiment did not have all of the instrumentation needed for performing the experiment with all of the arrays. Thus, though being confident in the validity of our results, we cannot discriminate between “true” agreement, i.e., concordance due to real similarity between platforms, and “technical” agreement, i.e., concordance or discordance related to the lab where the platform was processed. Notably, the measurement error model considered here was just one of the possible models that could be fitted. Scatterplots of Figure 7 show the fitted lines for each of the comparisons when λ=1. It is easy to note that the linearity of the relationship between discordant platforms can be questioned, so that the identified differences could be due to a non-linearity in it or to a lack-of-fit of the regression line, which can be accounted for by considering different functional forms for the x variable, both in a linear and in a non-linear context. Figure 7 Measurement error model fit. The red line represents fitted values of the measurement error model and green crosses represent miRNAs not in agreement after bias correction. For simplicity, we represent the model fit for λ=1 separately for line A498 (A) and hREF (B). To conclude, these results show that Agilent and Illumina were the most concordant platforms showing good patterns of agreement, whereas Affymetrix-related comparisons showed poor agreement for both lines. For line A498, this could be explained by technical issues on one replicate, whereas for line hREF, this could possibly be due to a non-linear relationship between the arrays, whose biological or technical source we were not able to ascertain. This suggests considering different functional forms to achieve a better characterization of the relationship. The power of the proposed method is that it can be used to assess agreement in various contexts and, though supposing a simple linear relation exists between arrays, allows one to estimate it in terms of model parameters corrected for the presence of measurement error, an issue that is often neglected in microarray studies.

Document structure show

projects that have annotations to this span

There is no project