Whole-exome sequencing data analysis A total of 20 DNA samples were obtained from 10 pairs of bone marrow (tumor) and germline (normal) cells collected from 10 NK-AML patients. We illustrated the workflow for whole-exome sequencing data analysis in Fig. 1. The raw reads in FASTQ file format were mapped to the human reference, which was developed by the 1,000 genome project (human_g1k_v 37.fasta), by using the Burrows-Wheeler aligner (BWA v 0.6.1), which generates SAM format files [13]. The SAM files were converted into binary format files (BAM) by samtools v0.2.6, which reduces the file size and improves computing efficiency [14]. The read group information of the sequencing machine was added to the head of the BAM files. The aligned sequences were sorted in the order of chromosomes by Picard v1.79 (http://picard.sourceforge.net) and went through a PCR duplicate marking process, which enables the Genome Analysis Toolkit (GATK) to ignore duplicates in subsequent processing [15]. Finally, the BAM files were indexed by bamtools v2.2.0 [16]. We performed a local realignment prior to recalibration, which gives the most accurate quality scores for each sample. Local realignment with known indel sites (Mills_and_1000G_gold_standard.indels.b37.vcf, 1000G_phase1.indels.b37.vcf) for each individual does not require multiple sample realignments, which demand extreme computational power. However, we carried out a local realignment with the matched tumor and normal samples together to prevent misalignment due to the differences between these two tissue types. Recalibration was performed with multiple known sites (dbSNV_137.b37, Mills_and_1000G_gold_standard.indels.b37.vcf, and 1000G_phase1.indels.b37.vcf), which may increase recalibration accuracy. We reduced the BAM file size to about 1/100 of the original file size by using the GATK tool, which saved variant calling time without losing any essential information. We used the UnifiedGenotyper of GATK for variant calling, followed by variant recalibration with known sites (hapmap_3.3.b37.vcf, 1000G_omni2.5.b37.vcf, dbsnp_137.b37.vcf, and Mills_and_1000G_gold_standard.indels.b37.vcf), and annotated by them using snpEff v2.0.57 [17].