2. Background This section provides detail on existing tools for generating synthetic DNA microarray data, and on methods from the DNA microarray community that are used to address heteroscedasticity of variance. 2.1. DNA Microarray Data Simulators There are several DNA microarray data simulators. Albers et al., suggest SIMAGE, a model and web based software implementation for simulating dual-dye DNA microarray data [8]. Their proposed model requires the specification of up to 29 parameters including biological and technical parameters. They discuss that model parameters are strongly dependent on the experiment performed, and they may even vary in different experiments performed in the same laboratory. SIMAGE is designed for simulating dual-dye DNA microarray data and cannot be used for generating single channel microarray data. The authors state that due to specific properties of each type of DNA microarray, creating data simulators for other microarray platforms would be a useful and interesting direction for future research. Dembélé proposed a model to simulate log2 intensity data or log2 ratio data for DNA microarrays [9]. As pointed out earlier, this is problematic for generating artificial kinome data where background-corrected values can be negative. In addition, model was constructed based on the assumption that intensities for each gene are uniformly distributed around its average. The noise component in the model is also normally distributed with a zero mean and a standard deviation, which is a parameter for the model. Therefore, generated microarray data from this method have a constant variance, which is not a realistic assumption for kinome microarray data, which suffer from heterogeneous variance; i.e., there is a relation between mean and variance. Nykter et al. [10] utilized several available error models to formulate biological and measurement variation in order to simulate microarray data with realistic characteristics. To represent the steps that may affect the quality of microarray data, they used noise, slide, hybridization, scanner, and error models. The models are controlled by multiple parameters, for a total of 94. It is not clear what parameter values should be used for generating kinome array data, or how such values would be determined. If nothing else, the task of determining values for such a large number of parameters discourages the method’s use. 2.2. Heteroscedasticity of Variance in Microarray Data Analysis Heteroscedasticity of variance is a formidable challenge confronting almost all types of microarray technologies. Affymetrix GeneChip and Illumina Sentrix BeadChip arrays are advanced DNA microarray technologies. The former is widely studied and used [20], while the latter is relatively new [21]. The major difference between these two platforms is that the Illumina platform offers a larger number of within-array replicates that can be utilized for further analysis. Variance-stabilizing methods have been used to deal with heterogeneity of variance in Affymetrix arrays [16,19]. With the Illumina platform, the large number of within-array technical replicates has facilitated the use of additional methods [12]. VSN [16] and VST [12] are two variance-stabilizing methods used by the microarray community. These methods have been constructed based on a model for microarray gene expression measurement noise by Rocke and Durbin [22]. Their model is as follows: (1) Y=α+μeγ+ϵ where Y is the measured intensity value, α is the average intensity of unexpressed genes, μ is the noise-free intensity value, ϵ is the additive error term, and γ is the multiplicative error term. The error terms are assumed to be normally distributed and independent random variables with zero means. Durbin et al. [19] utilized this model to introduce a transformation for variance stabilization in microarray data. Huber et al. [16], employed the model suggested by Rocke and Durbin to design a variance-stabilizing method named VSN. VSN first brings different arrays to the same scale and then transforms the data in such a way that it shows an approximately constant variance across its entire range. This method, like the Log2 transformation, is capable of dealing with very high intensities. In addition, it acts much like a linear transformation for weak intensities. Therefore, it avoids the problem of variance inflation caused by the Log2 method for weakly expressed genes. The values between these two extreme situations are smoothly interpolated by VSN [16]. The VSN method was proposed prior to the advent of Illumina Sentrix BeadChip arrays. This platform offers a statistically large number of within-array replicates that can be utilized for variance stabilization. Lin et al. [12] utilized such replicates to estimate the parameters of the model suggested by Rocke and Durbin. The proposed method, named VST, uses the same transformation as VSN. The difference between these two methods is that they use different ways to estimate the model parameters [12]. Unlike Illumina microarrays, kinome microarrays do not provide a statistically large number of within-array replicates, yet the number of within-array replicates in these arrays is more than Affymetrix arrays. Previously, kinome microarrays provided 2 to 3 replicates for each probe [23]. This number is now about 9 replicates for each probe, which is three times more than in Affymetrix arrays. This difference between the numbers of within-array replicates may affect the ability of different variance stabilizing methods in eliminating heterogeneity of variance in kinome arrays.