Bat Genome Projects There are two bat whole-genome sequencing projects in progress, but neither is complete at this point. One of them will provide coverage at a level that will not produce a reliable assembly. There are in fact a great many sequencing projects underway for which there is no intent to do full coverage [27]. The Myotis lucifugus genome sequencing project has completed sequencing to approximately 7-fold coverage with 27,486,306 traces thus far. The genome is, as of this writing, being assembled http://www.broadinstitute.org/science/projects/mammals-models/brown-bat/little-brown-bat. The Pteropus vampyrus genome sequencing project has produced 8,051,001 genome traces http://www.hgsc.bcm.tmc.edu/project-species-m-Megabat.hgsc?pageLocation=Megabat and is slated for completion with 2x coverage. The Pteropus project is based on samples from a single individual bat (personal communication). The consequence of incomplete coverage is the exacerbation of one of the shortcomings of whole-genome shotgun sequencing: the difficulty of resolving repetitive DNA segments [28]. If two sequencing traces contain regions of similarity, it may be difficult or impossible to determine whether these traces are derived from the same underlying DNA or from two distinct DNA segments that are themselves paralogous. This difficulty is not limited to the assembly of highly repetitious intergenic regions, but to the inference of gene families as well. This latter problem is particularly unfortunate, because gene duplication is a major source of innovative potential in evolution, and the comparative study of gene families among related species is therefore of great interest [29]. Furthermore, a large proportion of genes in eukaryotic genomes reside in families, including the genes that encode the type-I interferons. In this paper, we describe a method for the inference of gene family members from unassembled sequencing traces. The method is conceptually straightforward, and is based on an information-theoretic model that accounts for both sequencing error and evolutionary divergence, providing the means to encode the set of sequencing traces. We then seek those partial assemblies that make the total description length of the combined set of sequencing traces as small as possible. This reconstruction provides an estimate of the number of genes in the family and posterior probability mass functions on the DNA sequences of these genes. We first present the model and the algorithm we have developed to minimize the description length and thereby infer the structure of the gene family. We validate the methods by reconstructing the human type-I interferon family genes from sequencing traces from the human genome project. We use this method to infer the type-I interferon families from the sequencing traces from the Myotis lucifugus and Pteropus vampyrus genome sequencing projects. We examine these genes in comparison with the orthologous families in humans and other mammals. Finally, we confirm our inferences by cloning and sequencing genes from four of the interferon families.