We analyzed several features that could result in over- or underestimation of effect-size measurements between the family and population (see Figures S17–S19). First, because effect-size measurements can be influenced by differences in quantification pipelines, we repeated the experiment by using different quantification approaches (Tophat + Cufflinks and GEM34 + Flux Capacitor;31 Figures S13 and S14). Second, effect sizes in the population could potentially be underestimated if the best associated SNP in the discovery panel is not causal given that subsequent effect-size measurements, in the replication panel, might not accurately measure the largest effect. To address this, we examined different discovery-panel sizes (Table S6 and Figure S15) and different criteria (Figure S16) for selecting the best SNP from the population. In addition, we observed through permutation that levels of noise in measurements of effect size (β) were different between the family and the population (Figure S17). To better gauge confidence intervals (CIs) of family effect sizes, we estimated the degree of inflation through permutation and adjusted effect-size CIs by scaling. These adjusted CIs were only applied to comparisons of β values and are denoted by CIadjusted (see Figure S17–S19). For the main manuscript, we report only unadjusted CIs. Furthermore, without using subsampling or permutation, we also directly compared effect sizes with Welch’s t test by applying analytic estimation of SEs of β. As a correctness check of the subsampling method, we compared and verified that analytic p values by Welch’s t test and empirical p values by subsampling were concordant (Figure S19).