PMC:4375429 / 7421-8749
Annnotations
2_test
{"project":"2_test","denotations":[{"id":"25748353-20981092-2053673","span":{"begin":229,"end":230},"obj":"20981092"},{"id":"25748353-21460063-2053674","span":{"begin":677,"end":679},"obj":"21460063"}],"text":"My experience early in the project greatly influenced and, indeed, changed how I think about the analysis of large complex datasets. After assembling all the whole-genome sequence data collected in the first phase of the project,9 we set out to identify an optimal strategy for converting the raw sequence data (which consisted of many short sequence fragments with many errors) into high-quality lists of variants and genotypes for the individuals we were studying. I was sure that we would be able to compare several alternative strategies and agree on one superior strategy—ideally, this optimal strategy would be the one developed by my group at the University of Michigan.10 Instead, we were deadlocked. Each of the analyses carried out by teams at the Broad Institute, Michigan, and the Wellcome Trust Sanger Institute was optimal in some way. Eventually, we resolved the problem not by deciding on which of these strategies was superior but by combining their respective solutions into a consensus or ensemble prediction. Remarkably, this consensus solution was better in all respects than the solutions each of our teams had spent time crafting and optimizing. The advantages of ensemble predictors are common to many areas of computation, biology, and society—but I had not come across them in such a direct way before."}