Methods CNV detection and evaluation Whole genome microarray (Agilent 244K) was performed to detect structural variants in a clinical cohort. Subjects were eligible if they had a height measurement recorded between the ages of 2 and 20 years and had a chromosomal microarray performed as part of their clinical evaluation. All information was obtained with appropriate consent from Boston Children’s hospital [M09-06-0290]. Subjects with aneuploidy and poor microarray quality were not included, leaving a final sample size of 4,411 individuals including 415 patients with short stature, 196 patients with tall stature and 3800 patients with normal stature. All CNV data were called with NEXUS software (BioDiscovery, El Segundo, California). The recurrent copy-number variations involving ARID1B gene were validated by multiplex ligation-dependent probe amplification (MLPA). MLPA probe and reagents from MRC Holland (SALSA MLPA P433 ARID1A-ARID1B probemix). Data analysis and visualization was done on Coffalyzer software. A CNV is defined as non-benign when it does not overlap with CNV reported in DGV (Database of Genomic variants) or overlaps with CNV with less than 1 % population frequency in DGV. Non-benign recurrent or overlapping CNVs were identified in the subjects with short stature and compared to their occurrence in normal stature population. The UCSC genome browser’s custom track was used to depict the overlapping nature of CNVs and to delineate the minimal region of overlap (MRO). Literature review of height of patients with ARID1B deletion or mutation We identified a total of 70 individuals carrying ARID1B deletions or mutations from a Pubmed search and the DECIPHER database. 65 of them had information on height. We converted all height parameters available to Z-scores based on CDC growth charts (http://www.cdc.gov/growthcharts/zscore.htm). Mutation screening in 48 non-syndromic short stature Chinese patients Forty-eight non-syndromic short stature Chinese patients were recruited in Shanghai Children’s Medical Center. Their age, gender and height information are included in Additional file 1. The inclusion criteria were individuals with height below 3rd percentile without a clinical diagnosis of intellectual disability or developmental delay. All information was obtained with appropriate consent based on requirements of Shanghai Children’s Medical Center【SCMC-IRB-K2013007】.Subjects were randomly selected in non-syndromic short stature patients. Since it is unclear yet if ARID1B affects hormone-related pathways, we did not use hormonal status as a criteria for subject selection. Genomic DNA was extracted from peripheral blood of all participants using QIAamp Blood DNA Mini kit®. Mutation screening for all coding regions of ARID1B were done by Polymerase chain reaction (PCR) amplification followed by Sanger sequencing. Sequence variants were evaluated with mutation surveyor (Soft Genetics, State College, PA) and their potential functional impact was predicted using insilico prediction programs including SIFT [5], Polyphen2 [6], Condel [7] and Align-GVGD [8]. Paternity tests were performed with short tandem repeat (STR) markers for the two probands with de novo variants (using AmpFLSTR® Identifiler® PCR amplification kit). The study was reviewed and approved by the SCMC ethical committee and all participants or their parents signed an informed consent form. In addition, we compared the variant frequency in the ARID1B coding regions detected by exome sequencing to that of 494 normal Chinese controls. The normal Chinese controls were age and gender matching Chinese individuals of normal height, weight and were recruited from multiple geographic areas for an effort to create a common sequence variants database of normal Chinese children.