6. Results
The p-values for all 48 actual kinome arrays were less than the significance level of 0.05 (see Table S1 in Supplementary Materials). Therefore, the null hypothesis that the average background-corrected intensity values of actual kinome arrays are normally distributed is rejected. This further emphasizes that relying on normal distribution to simulate kinome array data may lead to unrealistic results.
Figure 1 shows an example result of applying Algorithm 1 for generating an inter-array replicate to an actual kinome array, “A-1”. In absence of noise, measurements for each peptide on inter-array replicates would be the same, and all points would lay on the identity line. However, this does not happen in the real world due to many sources of variability. Algorithm 1 allows the user to control the level of variability in generation of a synthesized kinome array using the fold-change threshold, percentage of noisy peptides, and significance level parameters (T, θ, and α, respectively). The result is shown in Figure 1 by points that deviate from the diagonal y=x. Figure 2 and Figure 3 show inter-array replicates for the same starting array with T = 3, θ = 0.10, and T = 4, θ = 0.15, respectively. In all three plots, the horizontal axis corresponds to the actual array, the vertical axis corresponds to the synthesized inter-array replicate, and each point depicts the average background-corrected intensity values for a peptide.
Scatter plots of original versus replicate array pairs for three template arrays other than “A-1” in the input dataset are shown in Figures S1–S3. They visually demonstrate that the algorithm does not create replicates using a repeating pattern and that generated replicates are reminiscent of actual inter-array replicates.
Figure 4 depicts histograms of average background-corrected intensity values for an actual kinome array and its inter-array technical replicate. In this figure, the red curve is the estimated distribution of the average background-corrected intensity values [30]. Similar figures for the other template arrays and their replicates are given as Figures S4–S6. It is clear that the histograms for the actual array and its inter-array technical replicate are not unrealistically the same, and that they generally follow the same distribution as depicted by the estimated distribution of the data.
Figure 1  Scatter plot of background-corrected intensity values for an array and its synthesized inter-array replicate with T = 2 and θ = 0.05.
Figure 2  Scatter plot of background-corrected intensity values for an array and its synthesized inter-array replicate with T = 3 and θ = 0.10.
Figure 3  Scatter plot of background-corrected intensity values for an array and its synthesized inter-array replicate with T = 4 and θ = 0.15.
Figure 4  Histogram of background-corrected intensity values for an actual kinome array (left); and its inter-array technical replicate (right). The green bars show a one-dimensional plot of background-corrected intensity values. The red curve is the estimated distribution of the values. Figure 5 shows an example result of applying Algorithm 2 for differentially phosphorylating a set of peptides on the array “A-1”. The replicate, parameter Y, was as shown in Figure 1. Differentially phosphorylated peptides in Figure 5 are depicted in red. Again background-correct intensity values are plotted. It should be noted that although we set the number of candidate peptides for phosphorylation to be 30, i.e., the length of phosphorylated vector in Algorithm 2, the number of differentially phosphorylated peptides is less than or equal to 30. This can happen because of an attempt to (de)phosphorylate a peptide that is highly (de)phosphorylated. This can lead to fewer differentially phosphorylated peptides than specified by nd. Scatter plots of original versus artificially phosphorylated replicate array pairs for three template arrays other than “A-1” in the input dataset are shown in Figures S7–S9 in Supplementary Materials.
Figure 5  Scatter plot of background-corrected intensity values for an array and a phosphorylated version of its synthesized inter-array replicate when T=2. Differentially-phosphorylated peptides are depicted in red. The null hypothesis for two-sample Kolmogorov-Smirnov test is that inter-array technical replicates produced by Algorithm 1 and the phosphorylated arrays produced by Algorithm 2 have the same distribution as the original (template) arrays. The p-values reported by the tests were greater than the significance level (0.05) in 46 of 48 cases for inter-array technical replicates, and in 41 of 48 cases for synthesized differentially phosphorylated arrays (see Tables S1–S3). Thus, in general the null hypothesis cannot be rejected.
Figure 6 and Figure 7 illustrate the effects of Log2 and VSN normalization, respectively, for the actual array and generated, differentially-phosphorylated replicate shown in Figure 5. In the scatter plots, the horizontal axis shows the actual array, while the vertical axis corresponds to the generated, differentially-phosphorylated replicate. The values on both axes were subjected to the transformation shown. In both figures the true differentially-phosphorylated peptides (set Pq of Step 3 in Section 4.2) are coloured in red to differentiate them from other peptides. Comparing Figure 6 and Figure 7 to Figure 5, it is obvious that Log2 destroys the information content for nonpositive average intensity values, while VSN preserves almost the same pattern as the raw array. However, VSN does not maintain fold-change values.
Figure 6  Scatter plot of background-corrected intensity values for an actual array and its generated, differentially-phosphorylated replicate, after Log2 normalization. Seeded differentially-phosphorylated peptides are depicted in red.
Figure 7  Scatter plot of background-corrected intensity values for an actual array and its generated, differentially-phosphorylated replicate, after VSN normalization. Seeded differentially-phosphorylated peptides are depicted in red. In order to examine the effect of Log2 and VSN transformations on the detection rate of differentially phosphorylated peptides, we applied the performance evaluation procedure (Section 4.2) to generate 48 pairs of arrays, and for each performance measure, we conducted a Levene’s test (Table 1) and a paired T-test (Table 2) to determine whether there is a significant difference between means of that performance measure for Log2 and VSN. For all cases where the p-value in Levene’s test was less than the significance level (α=0.05), the hypotheses that the population variances are equal is rejected. Therefore, for all measures paired t-tests assuming unequal variances were performed. Table 2 illustrates the t-statistics and p-values from the paired t-tests. Moreover, it depicts the average value of each performance measure for the Log2 and VSN methods. In all cases the degrees of freedom for the t-statistic was 47. These results indicate that the accuracy, sensitivity, and precision performance measures were significantly higher for VSN than for Log2 transformation. This result is in accordance with other studies in transcriptional DNA microarrays that indicate superiority of VSN over Log2 [19,31].
Table 1  Levene’s test for equality of variances.
Performance Measure  F-Value  p-Value
Specificity  6.9639  0.0097360
Sensitivity  24.327  0.0000035
Accuracy  6.3493  0.0134294
Precision  9.3306  0.0029321
Table 2  Paired t-test for comparison of the difference in performance measures between Log2 and VSN.
Performance Measure   t  Log2 Mean  VSN Mean  p-value
Specificity  +2.7739  0.9026  0.8809  9 . 960 × 10 − 1
Sensitivity  –31.529  0.3774  0.9601  1 . 432 × 10 − 33
Accuracy  –5.7156  0.8520  0.8888  3 . 617 × 10 − 7
Precision  –10.557  0.3096  0.5008  2 . 703 × 10 − 14