Material and Methods Samples and Clinical Data Tumor samples and adjacent, histologically normal tissues were obtained from 104 ESCC-affected individuals recruited from the Taihang Mountains of north-central China. All individuals gave their informed consent, and all samples were obtained before treatment according to the guidelines of the institutional review board of Shanxi Medical University (approval no. 2009029) and the ethics committee of Henan Cancer Hospital (approval no. 2009xjs12). The tumor samples of all affected individuals had at least 40%–50% of tumor cell content. This study was approved by the ethics committees of the Shanxi Medical University and Henan Cancer Hospital. The ESCC individuals collected for this study were staged according to the Cancer Staging Standards of the American Joint Committee on Cancer (seventh edition, 2010). The cohort of 104 ESCC individuals included 96 smoking and 8 non-smoking individuals. Different subsets of individuals were assayed on each platform: 14 tumors and 14 matched normal samples had data available on WGS (65× coverage), and 90 samples had data available on WES (132× coverage); in addition, 96 of the 104 samples had target capture deep resequencing (365× coverage). A detailed description of the clinical characteristics of the analyzed samples is presented in Table S1. We also analyzed our previously published ESCC mutation dataset6 of 17 WGS and 71 WES samples recruited from the Chaoshan District of Gongdong Province, another area of high ESCC prevalence in China. This cohort included 57 smoking and 31 non-smoking individuals.6 The summary of next-generation sequencing analyses in this study is shown in Figure S1. Sequencing For WGS, genomic DNAs extracted from 14 tumors and matched normal tissues were randomly fragmented and purified. Standard paired-end adaptors were ligated according to the manufacturer’s (Illumina) protocol. Adaptor-ligated fragments were purified with preparatory gel electrophoresis, and identical bands were excised, resulting in two libraries per sample with inserts averaging 500 bp. Four lanes of each of the resulting WGS libraries were subjected to WGS on an Illumina HiSeq 2000. Target depth (65× for tumors and normal samples) and at least 30× haploid coverage were achieved in all samples. For WES, the qualified genomic DNAs from 90 tumors and matched normal tissues were randomly fragmented by Covaris, ligated to Illumina sequenced adapters, and selected for lengths from 150 to 200 bp. Extracted DNAs (150–200 bp) were then amplified by ligation-mediated PCR (LM-PCR), purified, and hybridized to the NimbleGen SeqCap EZ Exome (44M) array for enrichment. Hybridized fragments were bound to the streptavidin beads, whereas non-hybridized fragments were washed out after 24 hr. We then subjected captured LM-PCR products to the Agilent 2100 Bioanalyzer to estimate the magnitude of enrichment. We independently loaded each captured library from the process described above on three lanes of an Illumina HiSeq 2000 platform with 90-bp paired-end reads for high-throughput sequencing to ensure that each sample met the desired average coverage. Raw image files were processed by Illumina base-calling software (v.1.7) with default parameters. The mean coverage achieved was 130× in tumors and 133× in normal tissue. A detailed description is presented in Table S2. Mutation Detection For detection of somatic point mutations, sequencing reads from a Illumina HiSeq 2000 were aligned to the human reference genome (UCSC Genome Browser hg19) with the Burrows-Wheeler Aligner. After duplicate reads (redundant information produced by PCR) were removed with SAMtools, an in-house pipeline was used to call somatic mutations. In brief, we implemented SAMtools (v.0.1.18) and VarScan (v.2.2.5) to call somatic variants. We required a minimum depth of 10× and variant frequency of 10% for both normal and tumor samples in order to call a specific variant at that locus. A single-nucleotide variant (SNV) was labeled highly confident if it met the following requirements: (1) the locus was not enriched with reads of low mapping quality, (2) reads that supported the SNV were not significantly overrepresented with bases of low quality, (3) reads that supported the SNV showed no bias toward the read end, (4) no gaps were found near the SNV locus, and (5) the SNV was not encompassed in short repeat regions. The indel-calling step was performed by the Genome Analysis Toolkit SomaticIndel Detector with default parameters. The highly confident indels were identified by an in-house pipeline and further annotated as germline or somatic on the basis of whether any evidence of the event at the same locus was observed in the normal data. Finally, highly confident SNVs were annotated with ANNOVAR and used in follow-up analysis. A full list of mutation events is presented in Table S3. Analysis of DNA Copy Number To detect DNA copy-number alterations (CNAs), we performed SegSeq11 to infer somatic CNA in ESCC genomes on the basis of WGS reads. Copy numbers ≤ 1.5 were considered to indicate deletions, and those ≥ 2.5 were considered amplifications. To infer recurrently amplified or deleted genomic regions, we re-implemented the GISTIC algorithm12 by using copy numbers in 1-kb windows as markers instead of SNP array probes. G-scores were calculated for genomic and gene-coding regions on the basis of the frequency and amplitude of amplification or deletion affecting each gene. A significant CNA region was defined as having amplification or deletion with a G-score > 0.1, corresponding to a p value threshold of 0.05 from the permutation-derived null distribution. A full list of CNAs is presented in Table S4. Identification of Significantly Mutated Genes To analyze mutation data and identify significantly mutated genes (SMGs), we applied the analytical methodology MutSigCV (mutation significance with covariates) to facilitate the significance analysis with default parameters. Pathway-Enrichment Analysis We extended our significance analysis beyond single genes by looking at gene sets. Pathway-enrichment analyses of genes with non-synonymous mutations were performed with KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment. In brief, we performed the pathway-enrichment analysis with the Database for Annotation, Visualization, and Integrated Discovery (v.6.7) by examining the distribution of the non-synonymously mutated genes identified within KEGG. Significantly altered pathways were determined by p values calculated on the basis of hypergenometric distribution with Benjamini correction. Cell Lines Normal esophageal epithelium cell lines SHEE and HEEPIC and the following ESCC lines were used in this study: EC9706, Eca109, TE1, TE12, TE13, Caes17, KYSE2, KYSE30, KYSE140, KYSE150, KYSE410, KYSE510, KYSE450, and KYSE680. All were tested and found to be free of mycoplasma contamination. HEK293T cells were used as a packaging cell line to produce virus. All cells were grown in DMEM with nutrient mixture F-12 (DMEM/F-12) supplemented with 10% fetal bovine serum at 37°C in 5% CO2. Endogenous products encoded by the genes of interest were detected through real-time PCR and/or immunoblotting analyses. For functional analysis of special genes, the ESCC lines with high endogenous gene expression were used for knockdown experiments, and ESCC lines with low gene expression were used for forced-expression experiments. Knockdown and Overexpression of Genes of Interest in ESCC Lines Lentiviral vector pLKO.1-puro and its packaging plasmids pMD2.G and psPAX2 were obtained from Addgene. Knockdown experiments of the special genes were performed in at least two ESCC lines with high endogenous protein levels. Two independent short hairpin RNAs (shRNAs) were cloned into the pLKO.1-puro vector as described previously.13 Specifically, to produce virus production, titration, and infection lentiviruses, we transfected HEK293T cells with the packaging plasmids along with the lentiviral shRNA vector by using Lipofectamine 2000 reagent (Invitrogen) according to the manufacturer’s instructions, and we changed the medium after 6 hr. Virus was harvested 24 hr after transfection, passed through 0.22-μm filters, and used fresh for shRNA infection. To perform lentiviral infections, we plated the target ESCC cells at 40%–50% confluence and incubated them overnight (16 hr). On the day of infections, the culture medium was replaced with the appropriately titered viral supernatant (1.5 ml/well) and incubated at 37°C for 24 hr; afterward, the viral supernatant was replaced with fresh media. Forty-eight hours later, infected cell populations were selected in puromycin (2 μg/ml). After 5 days of selection, shRNA-knockdown efficiency was determined by immunoblot analysis for the respective proteins with the use of specific antibodies. For knockdown of special genes in ESCC cells, two independent shRNA constructs that had been cloned into the pLKO.1-puro vector were used (Table S5). A non-specific targeting shRNA was also cloned into the pLKO.1-puro vector with the use of a scrambled control (SCR). Relative amounts of special gene product were normalized to β-actin levels. pMSCV-puro empty vector and wild-type pMSCV-puro-ZNF750 were generous gifts from Paul A. Khavari (Stanford University). pcDNA3-RFP empty vector and wild-type pcDNA3-RFP-AJUBA (MIM 609066) were generous gifts from Alejandra Garcia-Cattaneo (National Heart and Lung Institute, Imperial College London). The wild-type versions of these genes of interest were cloned into the pLV-EGFP(2A)-puro-GFP vector and validated by sequencing. For overexpression experiments, we used the pLV-EGFP(2A)-puro-GFP vector as a control. Viruses were produced as previously described. ESCC cells with low endogenous protein levels were infected with viruses as previously described.14 Twenty-four hours after infection, cells were subjected to subsequent experiments. The mutants of genes of interest were generated with the QuikChange II Site-Directed Mutagenesis Kit (Agilent). Fluorescence In Situ Hybridization Analysis Tumor and matched normal tissues of ESCC individuals were cut into pieces in PBS, swollen in 65 mmol/l KCl for 5 min at 37°C, fixed in cold acetic acid and methanol for 5 min at 4°C, dropped onto slides, and dried at room temperature. For interphase fluorescence in situ hybridization (FISH) analysis, slides were stained with Cytocell enumeration probes against chromosomal region 5q, CBX8 (chr17: 77,768,175–77,770,915), or CBX4 (MIM 603079; chr17: 77,806,954–77,813,213). These probes were conjugated with fluorescein isothiocyanate (FITC) or Cy3.5 (Rainbow Scientific). Probes against chromosomal region 5q or TMC8 (MIM 605829; located near the CBX4 and CBX8 regions) were used as controls for verification of focal CNAs of CBX4 or CBX8. Staining was carried out according to the manufacturer’s protocol. FISH samples were viewed with a fully automated, upright Zeiss Axio-ImagerZ.1 microscope with a 20× objective and DAPI, FITC, and Rhodamine filter cubes. Images were produced with the AxioCam MRm CCD camera and the Axiovision v.4.5 software suite. p values were calculated with a two-sample test for equality of proportions with continuity correction. qPCR Copy-Number Analysis CBX4 and CBX8 copy number was assessed in seven frozen tumor samples and matched normal tissues. Copy numbers were determined by real-time PCR with DNA binding dye SYBR Green I with the use of specific primer pairs that flanked coding exons of each gene. In a final volume of 25 μl, 20 ng DNA was amplified with SYBR Green PCR Master Mix (QIAGEN) in triplicate. RPPH1 (ribonuclease P RNA component H1 [MIM 608513]; Life Technologies, 4403328) was used as a diploid control, and TMC8 (chr17: 76,126,858–76,139,049) was used as a control located in the region near genes CBX4 and CBX8. Data were analyzed via the comparative (delta-Ct) Ct method. Immunoblotting Cells were lysed for 30 min in Triton buffer (1% Triton X-100, 50 mM Tris-HCl, pH 7.6, 150 mM NaCl, 1% sodium deoxycholate, and 0.1% SDS) supplemented with protease and phosphatase inhibitors (1 mM PMSF, 2 mM sodium pyrophosphate, 2 mM sodium betaglycerophosphate, 1 mM sodium fluoride, 1 mM sodium orthovanadate, 10 μg/ml leupeptin, and 10 μg/ml aprotinin). Lysates were cleared by centrifugation at 15,000 × g at 4°C for 15 min, and protein concentrations were determined by the Bradford method. Fifty micrograms of protein was separated by SDS-PAGE and transferred onto Immobilon-P membranes. Proteins were detected with special antibodies. Antibody binding was detected with horseradish-peroxidase-labeled anti-mouse (Sigma) or anti-rabbit (Cell Signaling) antibodies, and chemiluminescence was detected with a LAS4000 device (Fuji). Equal protein loading was confirmed with antibodies against β-actin (Transgen). Detailed information on antibodies is shown in Table S5. MTT Assay A total of 4 × 103 cells were seeded in 96-well plates and incubated in normal conditions for 24 hr. Cells were treated with 100 μl of 5 mg/ml of MTT (Invitrogen) solution for 4 hr at 37°C until crystals were formed. MTT solution was removed from each well, and 100 μl of DMSO was added to each well to dissolve the crystals. Color intensity was measured by Microplate Reader (Bio-Rad) at 490 nm. Each experiment consisted of four replications, and at least three independent experiments were carried out. For cell-death analysis, cells were treated, in duplicate, with BKM120 (10 μM), GANT61 (20 μM) or both for up to 72 hr prior to flow cytometric analysis for determining the extent of cell death by Annexin V/PI staining. Migration and Invasion Assays Migration and invasion assays were performed in 16-well CIM plates in an xCELLigence RTCA DP System (ACEA Biosciences) with Matrigel Basement Membrane Matrix (BD) for real-time cell-migration analysis as described previously.15 In brief, 30,000 cells per well were seeded as five duplicates in serum-free medium at the upper compartment of the CIM plates coated with or without Matrigel. Serum-complemented medium was added to the lower compartment of the chamber, and then measurement began in the xCELLigence RTCA DP system. We analyzed the cell-index curves to determine cell-invasion activity. For negative controls, we added serum-free medium at both upper and bottom chambers. The cell index representing the amount of migrated cells was calculated with RTCA Software from ACEA Biosciences. At least three independent experiments were carried out; for each independent experiment, five duplicates were performed for each group. Colony-Formation Assay The assay was performed as described previously.16 In brief, cells were seeded at 300–500 cells per well in 6-well plates containing complete DMEM/F-12 on day 0 and incubated at 37°C and 5% CO2 for 10 days. On day 10, cells were fixed with 4% polyformaldehyde for 15 min and stained with 1% crystal violet before quantification. The experiments were triplicated, and the numbers of colonies containing more than 50 cells were microscopically counted. Real-Time qPCR Real-time qPCR was used for measuring expression levels of genes of interest in a subset of tumor samples or ESCC lines. The probes or kits used in this study are shown in Table S5. All qPCR reactions were performed in triplicate with an Applied Biosystems StepOnePlus. The relative expression of genes of interest was determined by normalization to GAPDH expression via a standard-curve method with ten serial dilutions according to the manufacturer’s instructions. All real-time PCR experiments included a no-template control and were done in triplicate. Immunohistochemistry Analysis Immunohistochemistry was performed as previously described.17 In brief, sections were incubated with special antibody at an ideal dilution for 14 hr at 4°C and then detected with PV8000 (Zhongshan) and the DAB detection kit (Maixin), producing a dark-brown precipitate. Slides were counterstained with hematoxylin. All images were captured at 100×. The nuclear amount of the protein of interest was analyzed with Aperio Nuclear v.9 software, and the cytoplasmic protein amount was quantified with Aperio Cytoplasma 2.0 software. Statistical analyses were performed with GraphPad Prism 5.0. The staining intensity was scored as 0 (negative), 1 (weak), 2 (moderate), or 3 (strong). The immunoreactive score (IRS) was determined by the product of the extent score and the intensity score. IRS values ranged from 0 to 9, which were graded as follows: 0 (negative), 1–3 (weak), 4–6 (moderate), and 7–9 (strong). The median IRS was chosen to define the individuals, and the ratio of tumor to matched normal tissue (TIRS/NIRS) was used to compare the protein of interest with significantly high amounts (TIRS/NIRS > 2), high amounts (1 < TIRS/NIRS < 2), low amounts (TIRS/NIRS < 1), or no change (TIRS/NIRS = 1) in tumor tissue to that in matched normal tissue. All antibodies used in this study are shown in Table S5. Mouse Xenograft Assay and Immunohistochemistry To determine the effects of ZNF750 on tumorigenesis in vivo, we used a mouse xenograft assay with 12- to 14-week-old BALB/c nude female mice. We injected 2 × 106 KYSE150 cells stably depleted of ZNF750, scrambled control vector, wild-type ZNF750, or p.Ser70∗ ZNF750 into nude mice (n = 6 mice/group). The growth rates of xenograft tumors were measured for 4 weeks, after which mice were sacrificed. After 28 days, tumors were removed, snap frozen in liquid nitrogen, and stored at –80°C. Tumor size was measured with calipers. Additionally, formalin-fixed paraffin-embedded xenograft tumors were immunohistochemically stained with a monoclonal mouse Ki-67 antibody (Zhongshan). In brief, sections were incubated with the Ki67 antibody working dilution for 14 hr at 4°C and then detected with PV8000 (Zhongshan) and the DAB detection kit (Maixin), producing a dark-brown precipitate. Slides were counterstained with hematoxylin. All images were captured at 20×. The nuclear amount of Ki-67 was analyzed with Aperio Nuclear v.9 software. Statistical analyses were performed with GraphPad Prism 5.0. Statistical Analysis The SPSS Statistics 17.0 package was employed to correlate clinical and biological variables by means of Fisher’s test or a non-parametric test when necessary. Experiments were performed in triplicate, and data were presented as mean ± SD. Student’s t test was used for statistical analysis, and data from more than two groups were analyzed by one-way ANOVA in SPSS Statistics 19.0 and a subsequent Fisher’s least significant difference t test. Results were considered significant when p < 0.05. Association tests on ZNF750 genotype and levels were performed on log-transformed expression values by linear regression or t test.