Subjects and Methods Study Population, Samples, and Phenotypic Data The study population is from the coastal village of Nyamisati in the Rufiji delta in Tanzania.29 Human genomic DNA was extracted from whole peripheral blood, as previously described,30 with informed consent from participants or their guardians and approval of the local ethics committees of the Muhimbili University of Health and Allied Sciences and National Institute of Medical Research in Tanzania and the Regional Ethical Committee of Stockholm in Sweden. Phenotypic data spanning 7 years, from 1993 to 1999, was collected using annual total population surveys and annual records of malarial episodes, as previously described.30 The total population survey provided information on asymptomatic parasite load (parasites per μL) and hemoglobin levels. A single hemoglobin value was generated for each individual from annual total population surveys carried out over 7 years, corrected for age, sex, and parasite load prior to genetic analysis. A single parasite load value was generated for each individual from asymptomatic parasitemia recorded during annual total population surveys carried out over 7 years. This single value was corrected for age and sex prior to genetic analysis. All clinical malarial episodes were recorded and confirmed by microscopy. Multiple clinical episodes were recorded if the recurrence was greater than 4 weeks apart. A small proportion (1%) of individuals presented with a clinical episode at the time of the total population survey, and these samples were included in the analysis. A single clinical episode value was generated for each individual from records of all malarial episodes occurring in the village during a period of 7 years. The phenotypic data derived for clinical episodes, parasite load, and hemoglobin have previously been described in detail.30, 31 Fluorescence In Situ Hybridization using Single-Molecule DNA Fibers (Fiber-FISH) The probes used in this study included four fosmid clones selected from the UCSC Genome Browser GRCh37/hg19 assembly and a 3,632-bp PCR product that is specific for the glycophorin E repeat (see below). Probes were made by whole-genome amplification with GenomePlex Whole Genome Amplification Kits (Sigma-Aldrich) as described previously.32 Briefly, the purified fosmid DNA and the PCR product were amplified and then labeled as follows: G248P86579F1 and glycophorin E repeat-specific PCR product were labeled with digoxigenin-11-dUTP, G248P8211G10 was labeled with biotin-16-dUTP, G248P85804F12 was labeled with DNP-11-dUTP, and G248P80757F7 was labeled with Cy5-dUTP. All labeled dUTPs were purchased from Jena Bioscience. The preparation of single-molecule DNA fibers by molecular combing and fiber-FISH was as previously published,3, 33 with the exception of post-hybridization washes, which consisted of three 5-min washes in 2× SSC at 42°C, instead of two 20-min washes in 50% formamide/50% 2× SSC at room temperature. Interphase-, Metaphase-FISH, and Karyotyping by Multiplex-FISH Metaphase chromosomes were prepared from a human lymphoblastoid cell line (HG02554) purchased from Coriell Cell Repositories. Briefly, colcemid (Thermo Fisher Scientific) was added to a final concentration of 0.1 μg/mL for 1 hr, followed by treatment with hypotonic buffer (0.4% KCl in 10 mM HEPES [pH 7.4]) for 10 min and then fixed using 3:1 (v/v) methanol:acetic acid. For interphase- and metaphase-FISH, G248P8211G10 labeled with Texas Red-dUTP, G248P85804F12 labeled with Atto488-XX-dUTP (Jena Bioscience), and RP11-325A24 labeled with Atto425-dUTP (Jena Bioscience) were used as probes. Slides pre-treatments included a 10-min fixation in acetone (Sigma-Aldrich), followed by baking at 65°C for 1 hr. Metaphase spreads on slides were denatured by immersion in an alkaline denaturation solution (0.5 M NaOH, 1.0 M NaCl) for 10 min, followed by rinsing in 1 M Tris-HCl (pH 7.4) solution for 3 min, 1× PBS for 3 min, and dehydration through a 70%, 90%, and 100% ethanol series. The probe mix was denatured at 65°C for 10 min before being applied onto the denatured slides. Hybridization was performed at 37°C overnight. The post-hybridization washes included a 5-min stringent-wash in 1× SSC at 73°C, followed by a 5-min rinse in 2× SSC containing 0.05% Tween20 (VWR) and a 2-min rinse in 1× PBS, both at room temperature. Finally, slides were mounted with SlowFade Gold mounting solution containing 4′6-diamidino-2-phenylindole (Thermo Fisher Scientific). Multiplex-FISH (M-FISH) with human 24-color painting probe, as previously described.34 Slides were examined using AxioImager D1 microscope equipped with appropriate narrow-band pass filters for DAPI, Aqua, FITC, Cy3, and Cy5 fluorescence. Digital image capture and processing was carried out using the SmartCapture software (Digital Scientific UK). Ten randomly selected metaphase cells were karyotyped based on the M-FISH and DAPI-banding patterns using the SmartType Karyotyper software (Digital Scientific UK). PCR for Fiber-FISH Probe Generation The 3,632 bp glycophorin E repeat-specific PCR product for use as a fiber-FISH probe was generated by long PCR. Long PCRs were performed in a total volume of 25 μL using a Taq/Pfu DNA polymerase blend (0.6U Taq DNA polymerase/0.08U Pfu DNA polymerase), a final concentration of 0.2 μM primers Specific_glycophorinE_F and and Specific_glycophorinE_R (Table S2), in 45 mM Tris-HCl (pH8.8), 11 mM (NH4)2SO4, 4.5 mM MgCl2, 6.7 mM 2-mercaptoethanol, 4.4 mM EDTA, 1 mM of each dNTP (sodium salt), 113 μg/mL bovine serum albumin. Cycling conditions were an initial denaturation of 94°C for 1 min, a first stage consisting of 20 cycles each of 94°C for 15 s and 65°C for 10 min, and a second stage consisting of 12 cycles each of 94°C for 15 s and 65°C for 10 min (plus 15 s/cycle); these were followed by a single incubation phase of 72°C for 10 min. Illumina Sequencing of DUP4 Samples 1 μg genomic DNA was randomly fragmented to a size of 350 bp by shearing, DNA fragments were end polished, A-tailed, and ligated with the NEBNext adaptor for Illumina sequencing, and further PCR enriched by P5 and indexed P7 oligos. The PCR products were purified (AMPure XP system) and the resulting libraries were analyzed for size distribution by an Agilent 2100 Bioanalyzer and quantified using real-time PCR. Following sequencing on an Illumina platform, the resulting 150 bp paired-end sequences were examined for sequencing quality, aligned using BWA to the human reference genome (hg19 plus decoy sequences), sorted using samtools35 and duplicate reads marked using Picard, generating the final bam file. Sequencing and initial bioinformatics was done by Novogene Ltd. Sequence read depth was calculated using samtools to count mapped reads in non-overlapping 5 kb windows across the glycophorin region. Read counts were normalized for coverage to a non-CNV region (chr4:145516270–145842585), then to the first 50 kb of the glycophorin region which has diploid copy number of 2. DUP4 Junction Fragment PCR Genotyping Primer sequences are shown in Table S2. PCR was conducted in a final volume of 10 μL in 1× Kapa A PCR buffer (a standard ammonium sulfate PCR buffer) with a final concentration of 1.5 mM MgCl2, ∼10 ng genomic DNA, 0.2 mM of each of dATP, dCTP, dGTP, and dTTP, 1U Taq DNA Polymerase, 0.1 μM each of rs186873296F and rs186873296R, and 0.5 μM each of DUP4F2 and DUP4R2. Thermal cycling used an ABI Veriti Thermal cycler with an initial denaturation of 95°C for 2 min, followed by 35 cycles of 95°C 30 s, 58°C 30 s, 70°C 30 s, then followed by a final extension of 70°C for 5 min. 5 μL of each the PCR products were analyzed using standard horizontal electrophoresis on an ethidium-bromide-stained 2% agarose gel. Routine genotyping included a DUP4 positive control in every experiment (sample HG02554). We distinguished homozygotes by quantification of DUP4 and control band intensity on agarose gels using ImageJ, and calculating the ratio of DUP4:control band intensity for each individual. At low allele frequencies, homozygotes are expected to be rare. After log2 transformation, a cluster of four outliers of high ratio (>2 SD, log2ratio > 1.43) were clearly separated from the 278 other DUP4-positive samples, and these four were classified as homozygotes. The remaining 278 DUP4-positive samples were classified as heterozygotes. Family-Based Association Analysis Associations between the three clinical phenotypes and DUP4 genotype were tested using QTDT v.2.6.136 on the full dataset of 167 pedigrees, using an orthogonal model. The heritability for all the clinical phenotypes was initially estimated using a model for polygenic variance. A test for total evidence of association was performed which included all individuals within the samples, retaining as much information as possible. This total test of association included environmental, polygenic heritability, and additive major locus variance components within the model. To control for population stratification within family, association was tested in an orthogonal model including environmental, polygenic heritability, and additive major locus variance components. Direction of effect was estimated by comparing the hemoglobin level values, expressed as residuals from the regression model used to correct for age and sex, between the 262 unrelated individuals with hemoglobin level values from the Nyamisati cohort carrying (n = 70, mean = 1.67 g/L, standard deviation = 16.0 g/L) and not carrying (n = 192, mean = −0.12 g/L, standard deviation = 14.1 g/L) the DUP4 variant.