Selective constraint on CNV genes A central question in biology is how genomes evolve with respect to size and gene content and which factors affect and constrain this evolution. Intuitively, CNVs are likely to be subjected to selective pressure since large variants, in contrast with SNPs and other small variants, often affect entire protein-coding genes and substantial amounts of flanking DNA sequence. It has long been hypothesized that gene duplications are drivers of both genome and gene function evolution. As described by Ohno (2013), when a gene duplication event first occurs, the two copies of the gene are assumed to be functionally redundant. It is believed that in most instances one copy of the gene will eventually be lost (pseudogenization or nonfunctionalization). However, as natural selection does not “know” which copy of the duplicated gene should be under selection and which should be free of selective constraint, both paralogs experience a period of relaxed selection. During this stage, it is possible that some divergence may be allowed and occasionally one copy may acquire a new function and subsequently be maintained by natural selection. Rates of molecular evolution can be used to understand the selection constraints experienced by genes. In particular, contrasting the rate of protein-changing (non-synonymous) substitution and the rate of silent (synonymous) substitution at the nucleotide level allows us to identify the type of selection acting on individual genes. We measured selective constraint on cattle genes by using the dN/dS ratio. Here, dS denotes the synonymous substitution rate, and dN denotes the nonsynonymous substitution rate. When computed using sequences from divergent species, the dN/dS ratio is a measure of adaptive evolution in protein-coding sequences (Kryazhimskiy and Plotkin, 2008). For this reason we chose to use Homo sapiens as the comparison species since it is a well-studied organism, divergent from cattle. Generally dN/dS ratios are interpreted as follows. dN/dS = 1 implies equal numbers of synonymous and nonsynonymous substitutions. This means that most variation is not caused by natural selection, but by random drift of mutant alleles that are neutral. dN/dS > 1 implies more nonsynonymous changes than synonymous. This means that there has been evolutionary pressure to escape the ancestral state, i.e., positive selection. Similarly, dN/dS < 1 implies a larger number of synonymous changes compared to nonsynonymous, meaning that there has been evolutionary pressure to conserve the ancestral state, i.e., negative selection. dN/dS ratios were computed for orthologous pairs of genes (both CNV and neutral genes) between cattle and human (Supplementary Table 5). We first tested the hypothesis that, in general, compared to copy number neutral genes, CNV genes tend to be under relaxed selective pressure. This was done using a one-tailed Wilcoxon rank sum test, to test whether the median dN/dS ratio of all CNV genes was significantly higher that the median dN/dS ratio of neutral genes. We found that dN/dS ratios of CNV genes were significantly shifted toward higher values than neutral genes (Table 4), suggesting that CNV genes are subject to reduced selective constraint. This finding is consistent with previous results in both cattle and pigs (Fadista et al., 2010; Li et al., 2012). Table 4 dN/dS analysis. dN P-value dS P-value dN/dS P-value All CNV genes 0.1418 2.29E-09 0.5589 1.51E-07 0.2813 2.81E-06 Duplication genes 0.1601 1.45E-05 0.5135 0.0072 0.3151 3.01E-05 Deletion genes 0.1308 0.0142 0.5814 0.0083 0.2308 0.1068 Mixed genes 0.1235 1.36E-04 0.5681 4.79E-05 0.2702 0.0068 Neutral genes 0.0793 – 0.4288 – 0.1843 – Median nonsynonymous (dN), synonymous (dS), and dN/dS rates are shown. P-values compare copy number variable genes with copy number neutral genes using a one-tailed Wilcoxon rank-sum test. We also tested, individually, if duplication genes, deletion genes, and mixed genes tended to be under relaxed selective constraint compared to neutral genes. Both duplication and mixed genes were shown to have significantly higher dN/dS ratios than neutral genes, while dN/dS ratios of deletion genes did not differ significantly from those of neutral genes. The reduction in selective constraint observed in duplication and mixed genes follows Ohno's hypothesis that in a gene duplication event, one or both duplicates should experience relaxed selective constraint resulting in elevated rates of sequence evolution.