> top > projects > Microarrays > docs > PMC:4979053 > spans > 11008-11041

PMC:4979053 / 11008-11041 JSONTXT

Assessing Agreement between miRNA Microarray Platforms Abstract Over the last few years, miRNA microarray platforms have provided great insights into the biological mechanisms underlying the onset and development of several diseases. However, only a few studies have evaluated the concordance between different microarray platforms using methods that took into account measurement error in the data. In this work, we propose the use of a modified version of the Bland–Altman plot to assess agreement between microarray platforms. To this aim, two samples, one renal tumor cell line and a pool of 20 different human normal tissues, were profiled using three different miRNA platforms (Affymetrix, Agilent, Illumina) on triplicate arrays. Intra-platform reliability was assessed by calculating pair-wise concordance correlation coefficients (CCC) between technical replicates and overall concordance correlation coefficient (OCCC) with bootstrap percentile confidence intervals, which revealed moderate-to-good repeatability of all platforms for both samples. Modified Bland–Altman analysis revealed good patterns of concordance for Agilent and Illumina, whereas Affymetrix showed poor-to-moderate agreement for both samples considered. The proposed method is useful to assess agreement between array platforms by modifying the original Bland–Altman plot to let it account for measurement error and bias correction and can be used to assess patterns of concordance between other kinds of arrays other than miRNA microarrays. 1. Introduction MiRNAs are small non-coding RNA molecules that have been shown to play a critical role in tumorigenesis [1,2,3,4] and in several other pathologies [5,6,7,8]. In order to measure miRNA intensity levels, several methods, such as RT-qPCR, high-throughput sequencing and microarrays, have been developed and have enabled researchers to profile a large number of miRNAs simultaneously across different experimental conditions [9]. MiRNA microarrays, in particular, since their first appearance in 2004 [10], have known a considerable expansion in life sciences and are now routinely used in biomolecular research. In the last few years, miRNA microarrays have been compared with next generation sequencing technologies to study their performances [11,12]. The main interest lies in the potentialities and advantages offered by these new platforms in terms, for instance, of new miRNAs discovery [13]. However, since results from these comparison studies appear to be contrasting, miRNA microarrays still remain an effective and useful technology, whose characteristics need to be properly assessed, both in terms of within-platform reliability and of between-platforms agreement. To date, only a few studies have attempted to evaluate within-platform [14] and between-platform [15,16] reliability in miRNA microarrays. Results were reported mainly as correlation coefficients (both Pearson and Spearman) for evaluating both intra-platform and inter-platform performance, calculated on a subset of miRNA that depended on detection calls concordant between platforms (i.e., miRNA that were called “detected/present” on all platforms considered for the analysis). Additionally, Sato and colleagues assessed between-platform comparability also in terms of miRNAs that were commonly differentially expressed between samples for all platforms [15], whereas Yauk et al. [16] evaluated within-platform reproducibility via Lin’s concordance correlation coefficient [17]. In this work, a different approach based on the Bland–Altman method to assess between-platform agreement is proposed. In particular, the proposed method is applied to assess agreement between three different miRNA microarray platforms (Affymetrix, Agilent, Illumina). Additionally, advice against the use of Pearson/Spearman correlation coefficients is provided, and use of concordance correlation coefficients (pairwise and overall) is suggested as a better measure to evaluate within-platform reliability. 2. Experimental Section 2.1. Samples The two samples involved in the study are a renal tumor cell line named A498 (ATCC, Manassas, VA, USA) [18] and a pool of twenty different human normal tissues (namely, hREF), obtained from the First Choice© Human Total RNA Survey Panel, (Ambion Inc, Austin, TX, USA). RNA material was analyzed in different laboratories, as follows: Affymetrix processing took place at the Biomedical Technologies Institute of the University of Milan (Segrate, Italy); Illumina and Agilent processing were performed at the Department of Experimental Oncology of the National Cancer Institute (Milan, Italy). For both samples, three technical replicates for each platform were performed, leading to a total of 18 arrays, six for each different microarray platform. 2.2. Sample Preparation and Hybridization The cell lines were cultured at cell confluence according to the corresponding ATCC datasheets. Total RNA samples were extracted using the miRNeasy kit (Qiagen Spa, Hilden, Germany) and quantified by ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). Using an Agilent 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA), RNA integrity was assessed on the basis of the RIN (RNA integrity number) factor, and the presence of low molecular weight RNA (5S) was verified. Labeling of total RNA samples was performed using the FlashTagTM Biotin RNA labeling Kit (Genisphere Inc., Hatfield, PA, USA) starting from 1 μg of total RNA. Briefly, the tailing reaction is followed by ligation of the biotinylated signal molecule to the target RNA sample. The labeling reaction is based on Genisphere proprietary 3DNA dendrimer signal amplification technology. Prior to array hybridization, total RNA labeling was verified using the enzyme-linked oligosorbent assay (ELOSA). After that, biotin-labeled samples were hybridized onto the arrays at 48°C, then washed and stained using the Affymetrix Fluidics Station 450. Arrays were scanned with the GeneChip© Scanner 3000 7G to acquire fluorescent images of each array and analyzed by use of GeneChip Operating Software (GCOS, version 1.2). 2.3. Data Pre-Processing 2.3.1. Affymetrix GeneChip© miRNA Array Intensity .CEL files were obtained from the scan images and imported to the Affymetrix© miRNA QC Tool software (Version 1.0.33.0) to quantify the signal value. Quality control (QC) was assessed by plotting the average intensity of the oligo spike-in and background probe sets (included in the control target content) across all of the arrays. According to Genisphere, oligo spike-in 2, 23, 29, 31 and 36 probe sets should present a value of more than 1000 intensity units to accept array quality. The miRNA arrays were detected using the Affymetrix detection algorithm, based on the non-parametric Wilcoxon rank-sum test, applied independently on each array and probe/probe set; a p-value greater than 0.06 stands for “not detected above background” [19]. For data normalization, the “default” method was used, obtaining log2 expression values (expression values data matrix) from the raw data (intensity values data matrix). Briefly, this method involved the following three steps: grouping the background probes intensities based on GCcontent, where the median intensity of each bin was the correction value for each probe with the same GC content; a quantile normalization and, finally, a median polish summarization. To obtain a single intensity value for each miRNA mapped on the log2 array, intensity measures for replicated spots were averaged. 2.3.2. Agilent Human miRNA Microarray (V1) Images were scanned using the Agilent Feature Extraction (AFE) software (version 11.0.1.1), obtaining the total gene signal (TGS) for all miRNAs on the array. Negative values were transformed by adding, for each array separately, the absolute value of the minimum TGS intensity on the array as extracted by the AFE + 2 before log2 transformation [20]. Data extracted from AFE were imported in the R environment [21] and processed using the AgimiRNA package, available in Bioconductor [22,23]. 2.3.3. Illumina HumanMI_V2 Raw data were processed using the proprietary BeadStudio software (Version 3.3.8). No background subtraction was performed. Probe-level data were summarized to obtain miRNA-level data, and then, a log2 transformation was applied. 2.3.4. miRNA Selection and Normalization The Affymetrix platform contained information on 7815 miRNAs, of which 847 (10.83%) were human, whereas Agilent platform content was of 961 miRNAs (851 human, 88.55%), and on the Illumina array 1145 miRNAs were detected, of which 858 (74.93%) were human miRNAs. Human miRNAs common to all platforms were selected according to their name and confirmed by a search on miRBase (Release 18, November 2011). Unlike other published works, miRNAs were not filtered on a detection basis, because such an approach could possibly introduce a bias in the results. In fact, some of the miRNAs that are filtered out because they are “switched-off” could show patterns of within- and/or between-platform disagreement in another experiment, where they are “turned-on”. This could possibly lead to an over-estimate of the level of reliability. Considering only human miRNAs should circumvent this issue and, at the same time, provide relevant information, since human miRNAs are commonly those that are of major interest in biomolecular investigation. Moreover, no data normalization was performed. Almost all works that focused on comparing microarray platforms normalized their data (for instance, [15,16,24]), but this is a non-trivial issue that has to be carefully evaluated. As a matter of fact, to date, normalization for miRNA microarray has been largely debated, with results that have been somehow discordant [25,26,27,28,29], so that no “gold-standard” methods exists. Additionally, normalizing data in the context of assessing platform agreement poses other relevant problems. If data on two different platforms are normalized and then compared, then there is no way to discriminate between platform and normalization on the results of concordance/agreement/reproducibility assessment. A high level of between-platforms agreement, due not to the platforms themselves, but to the normalization used, might be found. On the other hand, the same normalization on different platforms could highlight patterns of discordance that cannot be ascribed to the platforms. Nonetheless, comparing un-normalized data exposes the risk of finding poor concordance, because of incidental batch effects occurring in the experiment, which may lead to an underestimate of the “true” agreement between platforms. In this paper, we have chosen to use non normalized data, so that we could assess the performance of different platforms “per se”. For the sake of comparison, data were also normalized with the quantile and loess algorithm, and results were compared to those obtained on non-normalized data. 2.4. Statistical Analysis 2.4.1. Intra-Platform Reliability To assess the reliability of the three miRNA microarray platforms, pair-wise concordance correlation coefficients [17] were computed for all possible pairs of technical replicates for all platforms, within each sample. The CCC ρc between two series of n measurements x and y is defined as: (1) ρc=2σxyσx2+σy2+μx−μy2=2ρσxσyσx2+σy2+μx−μy2 where ρ=σxyσxσy=∑i=1n(xi−x¯)(yi−y¯)∑i=1n(xi−x¯)2∑i=1n(yi−y¯)2 is the Pearson correlation coefficient between x and y, μx=∑i=1nxin and μy=∑i=1nyin are the sample means of x and y and σx2=∑i=1n(xi−x¯)2n−1 and σy2=∑i=1n(yi−y¯)2n−1 are the sample variances of x and y. Unlike the correlation coefficients, which only can give information about the existence of a linear relationship between two measurement methods, the CCC provides information on both precision (best-fit line) and accuracy (how far the best-fit line deviates from the concordance line) and is thus a better measure to assess platform reliability [30]. Additionally, the pairwise CCCs were combined within each sample and platform into an overall measure of reliability, the overall concordance correlation coefficient (OCCC) [31], a weighted mean of pairwise CCCs, which is defined as follows: (2) ρc0=∑j=1J−1∑k=j+1Jξjkρcjk∑j=1J−1∑k=j+1Jξjk where ρcjk is the standard Lin’s CCC between j-th and k-th replicate measurement series (in this study, these are the replicate arrays), and ξjk are the weights, specific for each paired comparison:(3) ξjk=σj2+σk2+μj−μk2 Confidence intervals for the OCCC were computed using the bootstrap [32]. Specifically, 1000 bootstrap samples were extracted, and for each of these samples, sample means, variances, covariances, CCC and OCCC were computed. Then, using the empirical distribution of the bootstrap, estimates of the OCCC percentile confidence intervals at 95% were estimated. To evaluate whether pairs of technical replicates are actually in agreement, the non-inferiority approach proposed by Liao and colleagues [33] for gene expression microarrays was followed. This approach consists of defining a threshold, or lower-bound, ρc(CL) reflecting the minimal value that the CCC should assume to conclude that two methods agree and then testing the following hypothesis: (4) H0:ρc≤ρc(CL)vs.H1:ρc>ρc(CL) This can be done using the confidence intervals for both CCC and OCCC, interpreting the results as follows: if the lower confidence bound falls below ρc(CL), then the null hypothesis cannot be rejected and the two replicates cannot be said to be in agreement; otherwise, the two replicates are in agreement. To determine the value of ρCL, the authors define the minimum thresholds of precision and accuracy, and then, since the CCC can be seen as a product of a precision and accuracy term, ρc(CL) is computed as the product of these two thresholds. In their example, they propose a threshold of 0.90, yet in this paper, we have chosen to use the value of 0.96, according to the following formula: (5) ρc(CL)=2ρCLvCL+vCL−1+uCL2=2*0.980.9+0.9−1+0.152=0.9638≈0.96 where v=σ1/σ2 represents the scale shift between the two measurements series and u=(μ1−μ2)/σ1σ2 is the location shift relative to the scale. The reason for the choice of these values is subjective, but in this case, there has been the attempt to be conservative: a higher value for ρCL means a relationship between technical replicates as linear as possible, though leaving space for small departures due to ineffective probes or small experimental effects. On the other hand, increasing vCL to 0.9 is due to the fact that miRNA measurements are assumed to be less variable than gene expression, so that technical replicates may show very similar patterns of variability. Only uCL is unchanged, because the value proposed in [33] appeared reasonable also for miRNA microarrays. 2.4.2. Between-Platform Agreement In the microarray literature, concordance between platforms has been often studied using the correlation coefficient. Not only is this the wrong approach, but additionally, correlation coefficients are computed assuming that intensity/expression values do not suffer from any measurement error, thus leading to possible underestimates of the real level of correlation between platforms [34]. Here, agreement between platforms was evaluated using a modified version of the Bland–Altman approach. Such a modification, suggested by Liao et al. [35], allows one not only to assess whether two methods of measurement are concordant, but also to provide information on the eventual sources of disagreement. In a nutshell (greater details can be read in the original paper), the method involved the estimation, for each platform pair and separately for each sample, of a measurement error model, i.e., a model where also the independent variable(s) X were assumed to be affected by uncertainty, of the form:(6) Yi=a0+b0Xi0+ϵi (7) Xi=Xi0+δi where (Xi0,Yi0),i=1,...,n were the unobserved true values of the two measurement methods to be compared, i.e., miRNAs intensities on the two platforms, and ϵi and δi, were the i.i.d. error components of the model, which followed a normal distribution with the mean equal to 0 and variances equal to σϵ2 and σδ2, respectively. To estimate this model, the ratio λ of the error variances of Y and X had to be known, possibly by means of replication or, when replication is not feasible, by setting it equal to 1, thus assuming equal error variances for both methods. In this study, both strategies were evaluated, using the technical replicates to estimate λ by fitting a linear model with the factor “replicate” as the covariate. The estimated residual variance was then used as the sample error variance for the platform. Once the parameters of the model were estimated, assuming that Y−X∼N(a0+(b0−1)X0,1+λσδ2), modified versions of the agreement interval for Y−X proposed by Bland and Altman [36,37] were estimated according to the bias (fixed or proportional) needed to correct for when comparing two platforms, as follows: (a)  No bias: (a0 = 0, b0 = 1) (8) Δ=−t1−α/2,n−11+λσ^δ,+t1−α/2,n−11+λσ^δ(b)  Fixed bias: (a0≠ 0, b0 = 1) (9) Δ=a0−t1−α/2,n−11+λσ^δ,a0+t1−α/2,n−11+λσ^δ(c)  Proportional bias: (a0 = 0, b0≠ 1) (10) Δ=b0−1Xi−t1−α/2,n−11+λσ^δ,b0−1Xi+t1−α/2,n−11+λσ^δ(d)  Fixed and Proportional bias: ( a0≠ 0, b0≠ 1) (11) Δ=a0+b0−1Xi−t1−α/2,n−11+λσ^δ,a0+b0−1Xi+t1−α/2,n−11+λσ^δ where Xi were the actual measured values of the method. Including only the parameter a0 in the agreement interval meant that the two methods differ only by a “fixed” shift that did not depend on the value of Xi (thus, fixed bias). Including only (or also) b0, on the other hand, meant that the differences between the two methods increased proportionally with the increase of the value of measurement Xi according to the value of the parameter b0 itself (thus, proportional bias). Finally, let n be the number of subjects and 0

Document structure show

Annnotations

blinded