4. Conclusions Studies evaluating high-throughput platform reliability by means of correlation coefficients, both Pearson and Spearman [15], are often seen in the literature. Many of these studies claim that high correlation coefficients imply high reliability. This misuse of the correlation coefficients has already been outlined in the microarray literature [30]; however, several studies still use these measures to assess the repeatability or reproducibility of high-throughput platforms. In this study, both the issue of within-platform reliability and between-platform agreement of three miRNA microarray platforms were discussed. Results have highlighted that Agilent and Illumina were the platforms showing the best pattern of both reliability and agreement, whereas Affymetrix appeared to have some technical problem. In terms of reliability, the OCCC for line A498 was relevantly lower than that for line hREF for all platforms and for all sets of miRNAs considered (common human miRNAs, human miRNAs array-wise, all miRNAs). Overall, Illumina (OCCC from 0.989 to 0.994) showed better performances with respect to Agilent (from 0.975 to 0.994) and, in particular, Affymetrix (from 0.927 to 0.993), and this was better seen for line A498. Since this line suffered from technical issues on Affymetrix and, partially, also on Agilent, the related OCCC possibly underestimated the “true” degree of reproducibility of the platforms, which could be better appreciated with line hREF, where it ranged from 0.992 (CI 95%: 0.990–0.993) to 0.993 (CI 95%: 0.993–0.994), with values almost identical to those of Agilent and Illumina, thus suggesting also in this case a good performance of the array. These results, however, only referred to the “repeatability” of the assay, meaning that technical replicates were performed under “conditions where independent test results are obtained with the same method on identical test items in the same laboratory by the same operator using the same equipment within short intervals of time”. Conversely, reproducibility implies “conditions where test results are obtained with the same method on identical test items in different laboratories by different operators using different equipment”, as defined by the ISO [40]. This specification is relevant, since studies claiming to have evaluated the reproducibility of arrays actually evaluated only their repeatability [15], whereas reproducibility assessment necessarily requires the involvement of multiple laboratories, as in the MAQCProject [41]. Assessing agreement between different methods of measurement is a task that has rarely been addressed in the microarray literature, where the focus has always been more on the evaluation of a linear relationship between platforms or with “gold-standard” assays, such as qPCR, mainly via computation of correlation coefficients. Such an approach has often been biased by the selection of miRNAs used to do the computations, in that often, only miRNAs concordantly detectable between platforms [15,16] were chosen, possibly leading to an overestimate of the real level of correlation between arrays. To avoid this issue, human miRNAs common (i.e., matched by name) on all of the platforms considered for the experiment were used. By setting a “strong” threshold at 95%, equivalent to 772 miRNAs, it was found that Agilent and Illumina arrays were concordant according to the agreement interval derived from the estimation of the measurement error model, irrespective of the method for choosing the value of λ. The choice of this threshold was subjective and should depend on the issue at hand; the present choice was based on the fact that it is likely that a few miRNAs exist that do not agree because of unwanted technical issues not attributable to the platform itself, but also to the need to reduce the number of false positives (i.e., falsely concordant platforms). This last point is crucial in the field of microarrays, since the comparability of results from different platforms as a tool for validating a laboratory’s own results has gained much relevance in omic research. As a matter of fact, the basic assumption is that the two platforms are “linked” in that what they convey about the profile of intensity/expression for a sample is similar, the net of the different measurement and the analytic scale of the platform itself. There is, however, a possible confounding effect in these results that was not possible to control, i.e., the site where the arrays were processed. As described in the Experimental Section, Affymetrix arrays were processed at a laboratory (say LAB1), Agilent and Illumina at another one (say LAB2), so that when the comparison between Affymetrix and the other platforms is done, different laboratories and different platforms are compared in a completely confounded way. In this perspective, it is not clear to what extent the laboratory effect encompasses the platform effect, i.e., how much the differences between Affymetrix and the other platforms were due to different sites of processing and how much to actual assay differences. Similarly, it could not be exclude that the high degree of concordance between Illumina and Agilent could be sharpened by this confounding effect. One alternative could have been to have all of the arrays processed at a single site, so that the shared laboratory effect would cancel out, yet this was not possible, for the facilities involved in the experiment did not have all of the instrumentation needed for performing the experiment with all of the arrays. Thus, though being confident in the validity of our results, we cannot discriminate between “true” agreement, i.e., concordance due to real similarity between platforms, and “technical” agreement, i.e., concordance or discordance related to the lab where the platform was processed. Notably, the measurement error model considered here was just one of the possible models that could be fitted. Scatterplots of Figure 7 show the fitted lines for each of the comparisons when λ=1. It is easy to note that the linearity of the relationship between discordant platforms can be questioned, so that the identified differences could be due to a non-linearity in it or to a lack-of-fit of the regression line, which can be accounted for by considering different functional forms for the x variable, both in a linear and in a non-linear context. Figure 7 Measurement error model fit. The red line represents fitted values of the measurement error model and green crosses represent miRNAs not in agreement after bias correction. For simplicity, we represent the model fit for λ=1 separately for line A498 (A) and hREF (B). To conclude, these results show that Agilent and Illumina were the most concordant platforms showing good patterns of agreement, whereas Affymetrix-related comparisons showed poor agreement for both lines. For line A498, this could be explained by technical issues on one replicate, whereas for line hREF, this could possibly be due to a non-linear relationship between the arrays, whose biological or technical source we were not able to ascertain. This suggests considering different functional forms to achieve a better characterization of the relationship. The power of the proposed method is that it can be used to assess agreement in various contexts and, though supposing a simple linear relation exists between arrays, allows one to estimate it in terms of model parameters corrected for the presence of measurement error, an issue that is often neglected in microarray studies.