article-title
|
2014 Curt Stern Award: Adventures in Human Genetics1
|
body
|
Main Text
Thank you very much. I would first like to thank the Awards Committee, The Society, and all of you who are here this morning. For the past 15 years, I have treasured coming to this meeting, which is always full of energy, ideas, and brilliant scientists. To be honest, I find it hard to see myself in Aravinda Chakravarti’s and Michael Boehnke’s kind description of me, and in the next few minutes, I will probably disprove their claim that I am a good communicator. I stand before you trembling.
If you will indulge me, I would like to tell you about the work I have done in human genetics and about what I have learned in the process. The past several years have been remarkable in that they’ve shown exponential growth in the size of human genetic studies. While preparing this talk, I reviewed the studies in which I have been involved since attending my first American Society of Human Genetics Annual Meeting. If we use the total number of genotypes characterized as a measure of study size, these studies have grown in size by a factor of about 4 every year, about 1,000 every 5 years, and about 1,000,000 every 10 years. It’s easy to be hopeful and imagine that this pace might continue for several years. For someone working on computational methods, this exponential growth presents many opportunities and challenges. It has also been a lot of fun, and I feel all of us are extremely lucky to witness and participate in this era of human genetics.
I started my career in human genetics at the Wellcome Trust Center for Human Genetics at a time when complex-trait studies where shifting from linkage to association. I originally started working in the lab on developing methods for “high-throughput” SNP discovery and genotyping. At that time, our “high-throughput” methods enabled characterization of perhaps hundreds of genotypes per day. Those rudimentary “high-throughput” methods enabled us to generate data at a pace that made analysis with the then-available tools cumbersome. My first paper, published in The American Journal of Human Genetics,1 described methods for analyzing association in samples that typically had been collected for family linkage studies. A companion computer program was able to handle hundreds or thousands of genetic markers and could examine the evidence for association in samples of families or unrelated individuals. During my time at Oxford, I was greatly aided by great colleagues and two very generous mentors, Professors William Cookson and Lon Cardon. They gave me the opportunity to explore interesting datasets and the flexibility to pursue my own ideas, and they shared their time generously.
Soon after, I worked on a method for rapid pedigree analysis,2 which Mike described charitably in his introduction. As I reminisce on this work, I would like to share two impressions with you. First, it is clear that I was greatly aided by Leonid Krulyak and Mark Daly’s earlier work3 on GeneHunter. (Mark is the 2014 co-recipient of the Curt Stern Award.) Their work served me as a useful guide and reference, allowing me to validate implementations of my new method and sparing me many sleepless nights. Second, I noticed an evaluation we’d carried out with a 10-SNP dataset generated by Bernard Keavney and colleagues.4 At the time, our new very fast and memory-efficient approach enabled analyses of that dataset to be completed in ∼40 s. That might have been an advance then, but it also represents a level of performance that is unacceptable now that datasets can include information on millions of genetic markers. Time will continue to put our best computational approaches to the test and demand that we continually improve or replace them.
In part because of this early work on computational methods for SNP analysis, I had the privilege of participating in many early efforts characterizing variation at a genomic scale, starting with the International HapMap Consortium. There, I encountered several greats of modern human genetics, including Mark Daly (again) and also Aravinda Chakravarti, Peter Donnelly, and David Altshuler. Aided by a second set of remarkable characters, I soon proceeded to apply methods to characterize genomic variation at scale to study a variety of complex human traits and diseases. Among the remarkable colleagues with whom I worked at the time, a few stand out: Michael Boehnke, Francis Collins, Anand Swaroop, David Schlessinger, Serena Sanna, Joel Hirschhorn, James Elder, Sekar Kathiresan, and Cristen Willer.
For me, one of the remarkable lessons from that time was to learn that by combining data and results across studies, we could rapidly accelerate the pace of discovery. This lesson was helped by the discovery of genotype imputation5,6 and by new tools for meta-analysis of genetic association studies.7 Genotype imputation uses stretches of chromosome shared between apparently unrelated individuals to reconstruct genotypes at any genetic marker and greatly facilitates comparisons between studies relying on different marker sets. In our first genome-wide association study (GWAS) meta-analysis, aided by imputation, we combined three studies each with little evidence of new loci and discovered 17 loci associated with blood lipid levels.8 Through advances in genotyping, imputation, and meta-analysis, these meta-analyses can now include hundreds of thousands of individuals. It is now straightforward to generate very large catalogs of loci associated with complex traits. It is also clear that, even 10 years ago, these catalogs would have been a great achievement and perhaps hard to imagine. Still, most of us are now unsatisfied with simply cataloguing complex-trait loci and believe it is very important to move on to the next phase, which is to translate these catalogs into insights about human disease biology and medicine.
At the same time that interesting things happen in genetics, other interesting things happen in the rest of our lives. I would like to share with you a memory from this time, when we were conducting our early genome-wide analyses of genetic variation and its relationship with many human traits. I visited Mark Daly in Boston, and he invited me to join him for dinner at the end of the day. We walked out of his office, walked through a grocery store near Massachusetts General Hospital, and picked up a truly large cut of salmon. I must have told Mark we’d have trouble finishing such a big fish, but he was undaunted. He drove us home and introduced me to Mary Pat and four charming children. Mark then proceeded to prepare and cook the fish. When dinner was ready, Mark’s four wonderful kids helped make the entire salmon disappear quickly. It was quite an inspiration to see a very successful scientist get home in time to cook dinner and share it with his family.
Studies of human genetic variation have been gradually shifting from arrays to full genome sequencing. One of the first projects to explore the possibilities was the 1000 Genomes Project, which also showed me how different very competitive, very able people could come together to make special things happen. If you know my three project co-leaders—David Altshuler, Gil McVean, and Richard Durbin—you’ll know that they are extremely competitive, extremely able scientists. Working together with them made the project a memorable experience for me.
My experience early in the project greatly influenced and, indeed, changed how I think about the analysis of large complex datasets. After assembling all the whole-genome sequence data collected in the first phase of the project,9 we set out to identify an optimal strategy for converting the raw sequence data (which consisted of many short sequence fragments with many errors) into high-quality lists of variants and genotypes for the individuals we were studying. I was sure that we would be able to compare several alternative strategies and agree on one superior strategy—ideally, this optimal strategy would be the one developed by my group at the University of Michigan.10 Instead, we were deadlocked. Each of the analyses carried out by teams at the Broad Institute, Michigan, and the Wellcome Trust Sanger Institute was optimal in some way. Eventually, we resolved the problem not by deciding on which of these strategies was superior but by combining their respective solutions into a consensus or ensemble prediction. Remarkably, this consensus solution was better in all respects than the solutions each of our teams had spent time crafting and optimizing. The advantages of ensemble predictors are common to many areas of computation, biology, and society—but I had not come across them in such a direct way before.
Most recently, we have applied the same methods and ideas that were used for sequencing the 1000 Genomes Project participants to other interesting studies, especially ones where rich health and medical information is available. One of the most interesting of these has been our ongoing study of the genetics of isolated Sardinian populations11 in collaboration with a very able team of Sardinian geneticists led by Francesco Cucca. To reduce cost, we sequenced each genome at low coverage.10 Although this sequencing strategy is not well suited to the analysis of a single genome, it has a remarkable property: as more and more individuals are sequenced, we are able to interpret available sequence data for each genome better. In practice, this means that as we sequence more genomes, we are able to examine progressively more genetic variants and, at the same time, to more accurately characterize the genotypes at each variant. There is not enough time to describe the results in detail, but given that you will soon hear from William Allan Award winner Stuart Orkin—who has done remarkable work studying the causes and mechanisms of the thalassemias and sickle cell anemia—I want to highlight one interesting and unpublished observation from our studies of low-density lipoprotein (LDL) cholesterol, a strong risk factor for cardiovascular disease. When we sequenced 2,000 Sardinians and examined the list of loci associated with LDL cholesterol, we observed many of the usual suspects seen in other GWASs (including the loci where APOB, APOE, LDLR, PCSK9, and SORT1 reside). Remarkably, we were also able to show a very strong effect on LDL cholesterol levels for the p.Gln39∗ variant in HBB—a variant that Dr. Orkin will know well as the cause of β-thalassemia on the island. Our analysis suggests that p.Gln39∗ is also one of the biggest drivers of LDL cholesterol levels in the Sardinian population. Association between the HBB locus and LDL cholesterol is not typically seen in GWASs because the loss-of-function alleles, which are typically rare and population specific, cannot be well captured by standard arrays.
The secret of success is to have many able collaborators, mentees, and students. I certainly wouldn’t be here without all their criticism, encouragement, ideas, and hard work. I am truly grateful to each of them.
It is probably premature, but also tempting, to reminisce on the lessons I have learned. I certainly have many more to learn. I share the lessons I have learned in the hopes that they might be helpful to audience members who are starting their careers in our wonderful field of human genetics.1. One person and a good idea can make a difference. As human genetic studies increase in size and complexity, it is sometimes easy to get disheartened and think that one can’t compete with much larger teams, studies, and budgets. Instead, we should remember that novel algorithmic insights can often improve computational performance by several orders of magnitude and thus enable even small teams to beat much larger and better resourced teams. Even more important, approaching a problem from a new angle can lead to completely new, previously overlooked answers and insights.
2. The best students, postdocs, and collaborators generally know something that you don’t. This insight is a bit disconcerting at first. When evaluating a prospective student, postdoc, or collaborator (which you should do carefully!), it is not sufficient to rank them according to what you know. Instead, it is perhaps most important to explore how much of what they know is something you don’t and how well you might complement each other. Many accomplishments depend on just the right partnership and would not be possible otherwise.
3. Take the time to be amazed. Drop everything, and explore a new idea. Some of my colleagues consider this a personality defect (we complement each other!), but I believe that on occasion it is important to drop the day-to-day things that one should be doing and instead take the time to explore something new, interesting, and fun. In many ways, that is where the fun of science, of genetics, and of discovery comes from.
4. Keep learning. There are so many ideas out there. Perhaps the truth in this statement is most obvious when attending The Society’s annual meetings. It is really a privilege to come to this meeting and to see so many developments in human genetics. I really find it surprising that, with all the great science being presented at this year’s meeting, I would have been lucky enough to be selected for the 2014 Curt Stern Award.
5. The most valuable tools and algorithms are often extremely simple. It’s sometimes tempting to think that great discoveries are complicated. I have learned over time that sometimes our most valuable discoveries are very simple. They just need to be applied and tried at the right time.
Looking ahead, it seems clear that we’ll sequence thousands of genomes and that we’ll discover better computational methods and strategies for managing, analyzing, and interpreting the resulting data. Still, some of the biggest challenges in our field are not about the scale of experiments or computational efficiency. In my view, there remain big challenges in how we enable different scientists to interact with data, understand it, choose powerful study designs, and answer important questions. As the field of human genetics and its applications get more complex and more diverse, I feel that we need to spend quite a bit of time facilitating communication of our findings to experts in other disciplines and across the different subfields in our society.
To conclude, I wanted to take a moment to thank my family. The past 10 years have also been a very interesting time for them. A little over 10 years ago, I married Cristen Willer, whom I met during my doctoral studies at Oxford. Cristen has been wonderfully encouraging, supportive, and patient. Those of you who know her will know that she is also an accomplished geneticist and member of The Society. We now have four kids, and another one is due in March. They are a little smaller than Mark’s, so they’re probably not as much of a handful, but they really make my days at the office worthwhile. I encourage anybody who is thinking about a career in genetics not to be concerned that it gets in the way of family life. Don’t trade. Do both.
Thank you all very much.
|
sec
|
Main Text
Thank you very much. I would first like to thank the Awards Committee, The Society, and all of you who are here this morning. For the past 15 years, I have treasured coming to this meeting, which is always full of energy, ideas, and brilliant scientists. To be honest, I find it hard to see myself in Aravinda Chakravarti’s and Michael Boehnke’s kind description of me, and in the next few minutes, I will probably disprove their claim that I am a good communicator. I stand before you trembling.
If you will indulge me, I would like to tell you about the work I have done in human genetics and about what I have learned in the process. The past several years have been remarkable in that they’ve shown exponential growth in the size of human genetic studies. While preparing this talk, I reviewed the studies in which I have been involved since attending my first American Society of Human Genetics Annual Meeting. If we use the total number of genotypes characterized as a measure of study size, these studies have grown in size by a factor of about 4 every year, about 1,000 every 5 years, and about 1,000,000 every 10 years. It’s easy to be hopeful and imagine that this pace might continue for several years. For someone working on computational methods, this exponential growth presents many opportunities and challenges. It has also been a lot of fun, and I feel all of us are extremely lucky to witness and participate in this era of human genetics.
I started my career in human genetics at the Wellcome Trust Center for Human Genetics at a time when complex-trait studies where shifting from linkage to association. I originally started working in the lab on developing methods for “high-throughput” SNP discovery and genotyping. At that time, our “high-throughput” methods enabled characterization of perhaps hundreds of genotypes per day. Those rudimentary “high-throughput” methods enabled us to generate data at a pace that made analysis with the then-available tools cumbersome. My first paper, published in The American Journal of Human Genetics,1 described methods for analyzing association in samples that typically had been collected for family linkage studies. A companion computer program was able to handle hundreds or thousands of genetic markers and could examine the evidence for association in samples of families or unrelated individuals. During my time at Oxford, I was greatly aided by great colleagues and two very generous mentors, Professors William Cookson and Lon Cardon. They gave me the opportunity to explore interesting datasets and the flexibility to pursue my own ideas, and they shared their time generously.
Soon after, I worked on a method for rapid pedigree analysis,2 which Mike described charitably in his introduction. As I reminisce on this work, I would like to share two impressions with you. First, it is clear that I was greatly aided by Leonid Krulyak and Mark Daly’s earlier work3 on GeneHunter. (Mark is the 2014 co-recipient of the Curt Stern Award.) Their work served me as a useful guide and reference, allowing me to validate implementations of my new method and sparing me many sleepless nights. Second, I noticed an evaluation we’d carried out with a 10-SNP dataset generated by Bernard Keavney and colleagues.4 At the time, our new very fast and memory-efficient approach enabled analyses of that dataset to be completed in ∼40 s. That might have been an advance then, but it also represents a level of performance that is unacceptable now that datasets can include information on millions of genetic markers. Time will continue to put our best computational approaches to the test and demand that we continually improve or replace them.
In part because of this early work on computational methods for SNP analysis, I had the privilege of participating in many early efforts characterizing variation at a genomic scale, starting with the International HapMap Consortium. There, I encountered several greats of modern human genetics, including Mark Daly (again) and also Aravinda Chakravarti, Peter Donnelly, and David Altshuler. Aided by a second set of remarkable characters, I soon proceeded to apply methods to characterize genomic variation at scale to study a variety of complex human traits and diseases. Among the remarkable colleagues with whom I worked at the time, a few stand out: Michael Boehnke, Francis Collins, Anand Swaroop, David Schlessinger, Serena Sanna, Joel Hirschhorn, James Elder, Sekar Kathiresan, and Cristen Willer.
For me, one of the remarkable lessons from that time was to learn that by combining data and results across studies, we could rapidly accelerate the pace of discovery. This lesson was helped by the discovery of genotype imputation5,6 and by new tools for meta-analysis of genetic association studies.7 Genotype imputation uses stretches of chromosome shared between apparently unrelated individuals to reconstruct genotypes at any genetic marker and greatly facilitates comparisons between studies relying on different marker sets. In our first genome-wide association study (GWAS) meta-analysis, aided by imputation, we combined three studies each with little evidence of new loci and discovered 17 loci associated with blood lipid levels.8 Through advances in genotyping, imputation, and meta-analysis, these meta-analyses can now include hundreds of thousands of individuals. It is now straightforward to generate very large catalogs of loci associated with complex traits. It is also clear that, even 10 years ago, these catalogs would have been a great achievement and perhaps hard to imagine. Still, most of us are now unsatisfied with simply cataloguing complex-trait loci and believe it is very important to move on to the next phase, which is to translate these catalogs into insights about human disease biology and medicine.
At the same time that interesting things happen in genetics, other interesting things happen in the rest of our lives. I would like to share with you a memory from this time, when we were conducting our early genome-wide analyses of genetic variation and its relationship with many human traits. I visited Mark Daly in Boston, and he invited me to join him for dinner at the end of the day. We walked out of his office, walked through a grocery store near Massachusetts General Hospital, and picked up a truly large cut of salmon. I must have told Mark we’d have trouble finishing such a big fish, but he was undaunted. He drove us home and introduced me to Mary Pat and four charming children. Mark then proceeded to prepare and cook the fish. When dinner was ready, Mark’s four wonderful kids helped make the entire salmon disappear quickly. It was quite an inspiration to see a very successful scientist get home in time to cook dinner and share it with his family.
Studies of human genetic variation have been gradually shifting from arrays to full genome sequencing. One of the first projects to explore the possibilities was the 1000 Genomes Project, which also showed me how different very competitive, very able people could come together to make special things happen. If you know my three project co-leaders—David Altshuler, Gil McVean, and Richard Durbin—you’ll know that they are extremely competitive, extremely able scientists. Working together with them made the project a memorable experience for me.
My experience early in the project greatly influenced and, indeed, changed how I think about the analysis of large complex datasets. After assembling all the whole-genome sequence data collected in the first phase of the project,9 we set out to identify an optimal strategy for converting the raw sequence data (which consisted of many short sequence fragments with many errors) into high-quality lists of variants and genotypes for the individuals we were studying. I was sure that we would be able to compare several alternative strategies and agree on one superior strategy—ideally, this optimal strategy would be the one developed by my group at the University of Michigan.10 Instead, we were deadlocked. Each of the analyses carried out by teams at the Broad Institute, Michigan, and the Wellcome Trust Sanger Institute was optimal in some way. Eventually, we resolved the problem not by deciding on which of these strategies was superior but by combining their respective solutions into a consensus or ensemble prediction. Remarkably, this consensus solution was better in all respects than the solutions each of our teams had spent time crafting and optimizing. The advantages of ensemble predictors are common to many areas of computation, biology, and society—but I had not come across them in such a direct way before.
Most recently, we have applied the same methods and ideas that were used for sequencing the 1000 Genomes Project participants to other interesting studies, especially ones where rich health and medical information is available. One of the most interesting of these has been our ongoing study of the genetics of isolated Sardinian populations11 in collaboration with a very able team of Sardinian geneticists led by Francesco Cucca. To reduce cost, we sequenced each genome at low coverage.10 Although this sequencing strategy is not well suited to the analysis of a single genome, it has a remarkable property: as more and more individuals are sequenced, we are able to interpret available sequence data for each genome better. In practice, this means that as we sequence more genomes, we are able to examine progressively more genetic variants and, at the same time, to more accurately characterize the genotypes at each variant. There is not enough time to describe the results in detail, but given that you will soon hear from William Allan Award winner Stuart Orkin—who has done remarkable work studying the causes and mechanisms of the thalassemias and sickle cell anemia—I want to highlight one interesting and unpublished observation from our studies of low-density lipoprotein (LDL) cholesterol, a strong risk factor for cardiovascular disease. When we sequenced 2,000 Sardinians and examined the list of loci associated with LDL cholesterol, we observed many of the usual suspects seen in other GWASs (including the loci where APOB, APOE, LDLR, PCSK9, and SORT1 reside). Remarkably, we were also able to show a very strong effect on LDL cholesterol levels for the p.Gln39∗ variant in HBB—a variant that Dr. Orkin will know well as the cause of β-thalassemia on the island. Our analysis suggests that p.Gln39∗ is also one of the biggest drivers of LDL cholesterol levels in the Sardinian population. Association between the HBB locus and LDL cholesterol is not typically seen in GWASs because the loss-of-function alleles, which are typically rare and population specific, cannot be well captured by standard arrays.
The secret of success is to have many able collaborators, mentees, and students. I certainly wouldn’t be here without all their criticism, encouragement, ideas, and hard work. I am truly grateful to each of them.
It is probably premature, but also tempting, to reminisce on the lessons I have learned. I certainly have many more to learn. I share the lessons I have learned in the hopes that they might be helpful to audience members who are starting their careers in our wonderful field of human genetics.1. One person and a good idea can make a difference. As human genetic studies increase in size and complexity, it is sometimes easy to get disheartened and think that one can’t compete with much larger teams, studies, and budgets. Instead, we should remember that novel algorithmic insights can often improve computational performance by several orders of magnitude and thus enable even small teams to beat much larger and better resourced teams. Even more important, approaching a problem from a new angle can lead to completely new, previously overlooked answers and insights.
2. The best students, postdocs, and collaborators generally know something that you don’t. This insight is a bit disconcerting at first. When evaluating a prospective student, postdoc, or collaborator (which you should do carefully!), it is not sufficient to rank them according to what you know. Instead, it is perhaps most important to explore how much of what they know is something you don’t and how well you might complement each other. Many accomplishments depend on just the right partnership and would not be possible otherwise.
3. Take the time to be amazed. Drop everything, and explore a new idea. Some of my colleagues consider this a personality defect (we complement each other!), but I believe that on occasion it is important to drop the day-to-day things that one should be doing and instead take the time to explore something new, interesting, and fun. In many ways, that is where the fun of science, of genetics, and of discovery comes from.
4. Keep learning. There are so many ideas out there. Perhaps the truth in this statement is most obvious when attending The Society’s annual meetings. It is really a privilege to come to this meeting and to see so many developments in human genetics. I really find it surprising that, with all the great science being presented at this year’s meeting, I would have been lucky enough to be selected for the 2014 Curt Stern Award.
5. The most valuable tools and algorithms are often extremely simple. It’s sometimes tempting to think that great discoveries are complicated. I have learned over time that sometimes our most valuable discoveries are very simple. They just need to be applied and tried at the right time.
Looking ahead, it seems clear that we’ll sequence thousands of genomes and that we’ll discover better computational methods and strategies for managing, analyzing, and interpreting the resulting data. Still, some of the biggest challenges in our field are not about the scale of experiments or computational efficiency. In my view, there remain big challenges in how we enable different scientists to interact with data, understand it, choose powerful study designs, and answer important questions. As the field of human genetics and its applications get more complex and more diverse, I feel that we need to spend quite a bit of time facilitating communication of our findings to experts in other disciplines and across the different subfields in our society.
To conclude, I wanted to take a moment to thank my family. The past 10 years have also been a very interesting time for them. A little over 10 years ago, I married Cristen Willer, whom I met during my doctoral studies at Oxford. Cristen has been wonderfully encouraging, supportive, and patient. Those of you who know her will know that she is also an accomplished geneticist and member of The Society. We now have four kids, and another one is due in March. They are a little smaller than Mark’s, so they’re probably not as much of a handful, but they really make my days at the office worthwhile. I encourage anybody who is thinking about a career in genetics not to be concerned that it gets in the way of family life. Don’t trade. Do both.
Thank you all very much.
|
title
|
Main Text
|
p
|
Thank you very much. I would first like to thank the Awards Committee, The Society, and all of you who are here this morning. For the past 15 years, I have treasured coming to this meeting, which is always full of energy, ideas, and brilliant scientists. To be honest, I find it hard to see myself in Aravinda Chakravarti’s and Michael Boehnke’s kind description of me, and in the next few minutes, I will probably disprove their claim that I am a good communicator. I stand before you trembling.
|
p
|
If you will indulge me, I would like to tell you about the work I have done in human genetics and about what I have learned in the process. The past several years have been remarkable in that they’ve shown exponential growth in the size of human genetic studies. While preparing this talk, I reviewed the studies in which I have been involved since attending my first American Society of Human Genetics Annual Meeting. If we use the total number of genotypes characterized as a measure of study size, these studies have grown in size by a factor of about 4 every year, about 1,000 every 5 years, and about 1,000,000 every 10 years. It’s easy to be hopeful and imagine that this pace might continue for several years. For someone working on computational methods, this exponential growth presents many opportunities and challenges. It has also been a lot of fun, and I feel all of us are extremely lucky to witness and participate in this era of human genetics.
|
p
|
I started my career in human genetics at the Wellcome Trust Center for Human Genetics at a time when complex-trait studies where shifting from linkage to association. I originally started working in the lab on developing methods for “high-throughput” SNP discovery and genotyping. At that time, our “high-throughput” methods enabled characterization of perhaps hundreds of genotypes per day. Those rudimentary “high-throughput” methods enabled us to generate data at a pace that made analysis with the then-available tools cumbersome. My first paper, published in The American Journal of Human Genetics,1 described methods for analyzing association in samples that typically had been collected for family linkage studies. A companion computer program was able to handle hundreds or thousands of genetic markers and could examine the evidence for association in samples of families or unrelated individuals. During my time at Oxford, I was greatly aided by great colleagues and two very generous mentors, Professors William Cookson and Lon Cardon. They gave me the opportunity to explore interesting datasets and the flexibility to pursue my own ideas, and they shared their time generously.
|
p
|
Soon after, I worked on a method for rapid pedigree analysis,2 which Mike described charitably in his introduction. As I reminisce on this work, I would like to share two impressions with you. First, it is clear that I was greatly aided by Leonid Krulyak and Mark Daly’s earlier work3 on GeneHunter. (Mark is the 2014 co-recipient of the Curt Stern Award.) Their work served me as a useful guide and reference, allowing me to validate implementations of my new method and sparing me many sleepless nights. Second, I noticed an evaluation we’d carried out with a 10-SNP dataset generated by Bernard Keavney and colleagues.4 At the time, our new very fast and memory-efficient approach enabled analyses of that dataset to be completed in ∼40 s. That might have been an advance then, but it also represents a level of performance that is unacceptable now that datasets can include information on millions of genetic markers. Time will continue to put our best computational approaches to the test and demand that we continually improve or replace them.
|
p
|
In part because of this early work on computational methods for SNP analysis, I had the privilege of participating in many early efforts characterizing variation at a genomic scale, starting with the International HapMap Consortium. There, I encountered several greats of modern human genetics, including Mark Daly (again) and also Aravinda Chakravarti, Peter Donnelly, and David Altshuler. Aided by a second set of remarkable characters, I soon proceeded to apply methods to characterize genomic variation at scale to study a variety of complex human traits and diseases. Among the remarkable colleagues with whom I worked at the time, a few stand out: Michael Boehnke, Francis Collins, Anand Swaroop, David Schlessinger, Serena Sanna, Joel Hirschhorn, James Elder, Sekar Kathiresan, and Cristen Willer.
|
p
|
For me, one of the remarkable lessons from that time was to learn that by combining data and results across studies, we could rapidly accelerate the pace of discovery. This lesson was helped by the discovery of genotype imputation5,6 and by new tools for meta-analysis of genetic association studies.7 Genotype imputation uses stretches of chromosome shared between apparently unrelated individuals to reconstruct genotypes at any genetic marker and greatly facilitates comparisons between studies relying on different marker sets. In our first genome-wide association study (GWAS) meta-analysis, aided by imputation, we combined three studies each with little evidence of new loci and discovered 17 loci associated with blood lipid levels.8 Through advances in genotyping, imputation, and meta-analysis, these meta-analyses can now include hundreds of thousands of individuals. It is now straightforward to generate very large catalogs of loci associated with complex traits. It is also clear that, even 10 years ago, these catalogs would have been a great achievement and perhaps hard to imagine. Still, most of us are now unsatisfied with simply cataloguing complex-trait loci and believe it is very important to move on to the next phase, which is to translate these catalogs into insights about human disease biology and medicine.
|
p
|
At the same time that interesting things happen in genetics, other interesting things happen in the rest of our lives. I would like to share with you a memory from this time, when we were conducting our early genome-wide analyses of genetic variation and its relationship with many human traits. I visited Mark Daly in Boston, and he invited me to join him for dinner at the end of the day. We walked out of his office, walked through a grocery store near Massachusetts General Hospital, and picked up a truly large cut of salmon. I must have told Mark we’d have trouble finishing such a big fish, but he was undaunted. He drove us home and introduced me to Mary Pat and four charming children. Mark then proceeded to prepare and cook the fish. When dinner was ready, Mark’s four wonderful kids helped make the entire salmon disappear quickly. It was quite an inspiration to see a very successful scientist get home in time to cook dinner and share it with his family.
|
p
|
Studies of human genetic variation have been gradually shifting from arrays to full genome sequencing. One of the first projects to explore the possibilities was the 1000 Genomes Project, which also showed me how different very competitive, very able people could come together to make special things happen. If you know my three project co-leaders—David Altshuler, Gil McVean, and Richard Durbin—you’ll know that they are extremely competitive, extremely able scientists. Working together with them made the project a memorable experience for me.
|
p
|
My experience early in the project greatly influenced and, indeed, changed how I think about the analysis of large complex datasets. After assembling all the whole-genome sequence data collected in the first phase of the project,9 we set out to identify an optimal strategy for converting the raw sequence data (which consisted of many short sequence fragments with many errors) into high-quality lists of variants and genotypes for the individuals we were studying. I was sure that we would be able to compare several alternative strategies and agree on one superior strategy—ideally, this optimal strategy would be the one developed by my group at the University of Michigan.10 Instead, we were deadlocked. Each of the analyses carried out by teams at the Broad Institute, Michigan, and the Wellcome Trust Sanger Institute was optimal in some way. Eventually, we resolved the problem not by deciding on which of these strategies was superior but by combining their respective solutions into a consensus or ensemble prediction. Remarkably, this consensus solution was better in all respects than the solutions each of our teams had spent time crafting and optimizing. The advantages of ensemble predictors are common to many areas of computation, biology, and society—but I had not come across them in such a direct way before.
|
p
|
Most recently, we have applied the same methods and ideas that were used for sequencing the 1000 Genomes Project participants to other interesting studies, especially ones where rich health and medical information is available. One of the most interesting of these has been our ongoing study of the genetics of isolated Sardinian populations11 in collaboration with a very able team of Sardinian geneticists led by Francesco Cucca. To reduce cost, we sequenced each genome at low coverage.10 Although this sequencing strategy is not well suited to the analysis of a single genome, it has a remarkable property: as more and more individuals are sequenced, we are able to interpret available sequence data for each genome better. In practice, this means that as we sequence more genomes, we are able to examine progressively more genetic variants and, at the same time, to more accurately characterize the genotypes at each variant. There is not enough time to describe the results in detail, but given that you will soon hear from William Allan Award winner Stuart Orkin—who has done remarkable work studying the causes and mechanisms of the thalassemias and sickle cell anemia—I want to highlight one interesting and unpublished observation from our studies of low-density lipoprotein (LDL) cholesterol, a strong risk factor for cardiovascular disease. When we sequenced 2,000 Sardinians and examined the list of loci associated with LDL cholesterol, we observed many of the usual suspects seen in other GWASs (including the loci where APOB, APOE, LDLR, PCSK9, and SORT1 reside). Remarkably, we were also able to show a very strong effect on LDL cholesterol levels for the p.Gln39∗ variant in HBB—a variant that Dr. Orkin will know well as the cause of β-thalassemia on the island. Our analysis suggests that p.Gln39∗ is also one of the biggest drivers of LDL cholesterol levels in the Sardinian population. Association between the HBB locus and LDL cholesterol is not typically seen in GWASs because the loss-of-function alleles, which are typically rare and population specific, cannot be well captured by standard arrays.
|
p
|
The secret of success is to have many able collaborators, mentees, and students. I certainly wouldn’t be here without all their criticism, encouragement, ideas, and hard work. I am truly grateful to each of them.
|
p
|
It is probably premature, but also tempting, to reminisce on the lessons I have learned. I certainly have many more to learn. I share the lessons I have learned in the hopes that they might be helpful to audience members who are starting their careers in our wonderful field of human genetics.1. One person and a good idea can make a difference. As human genetic studies increase in size and complexity, it is sometimes easy to get disheartened and think that one can’t compete with much larger teams, studies, and budgets. Instead, we should remember that novel algorithmic insights can often improve computational performance by several orders of magnitude and thus enable even small teams to beat much larger and better resourced teams. Even more important, approaching a problem from a new angle can lead to completely new, previously overlooked answers and insights.
2. The best students, postdocs, and collaborators generally know something that you don’t. This insight is a bit disconcerting at first. When evaluating a prospective student, postdoc, or collaborator (which you should do carefully!), it is not sufficient to rank them according to what you know. Instead, it is perhaps most important to explore how much of what they know is something you don’t and how well you might complement each other. Many accomplishments depend on just the right partnership and would not be possible otherwise.
3. Take the time to be amazed. Drop everything, and explore a new idea. Some of my colleagues consider this a personality defect (we complement each other!), but I believe that on occasion it is important to drop the day-to-day things that one should be doing and instead take the time to explore something new, interesting, and fun. In many ways, that is where the fun of science, of genetics, and of discovery comes from.
4. Keep learning. There are so many ideas out there. Perhaps the truth in this statement is most obvious when attending The Society’s annual meetings. It is really a privilege to come to this meeting and to see so many developments in human genetics. I really find it surprising that, with all the great science being presented at this year’s meeting, I would have been lucky enough to be selected for the 2014 Curt Stern Award.
5. The most valuable tools and algorithms are often extremely simple. It’s sometimes tempting to think that great discoveries are complicated. I have learned over time that sometimes our most valuable discoveries are very simple. They just need to be applied and tried at the right time.
|
label
|
1.
|
p
|
One person and a good idea can make a difference. As human genetic studies increase in size and complexity, it is sometimes easy to get disheartened and think that one can’t compete with much larger teams, studies, and budgets. Instead, we should remember that novel algorithmic insights can often improve computational performance by several orders of magnitude and thus enable even small teams to beat much larger and better resourced teams. Even more important, approaching a problem from a new angle can lead to completely new, previously overlooked answers and insights.
|
label
|
2.
|
p
|
The best students, postdocs, and collaborators generally know something that you don’t. This insight is a bit disconcerting at first. When evaluating a prospective student, postdoc, or collaborator (which you should do carefully!), it is not sufficient to rank them according to what you know. Instead, it is perhaps most important to explore how much of what they know is something you don’t and how well you might complement each other. Many accomplishments depend on just the right partnership and would not be possible otherwise.
|
label
|
3.
|
p
|
Take the time to be amazed. Drop everything, and explore a new idea. Some of my colleagues consider this a personality defect (we complement each other!), but I believe that on occasion it is important to drop the day-to-day things that one should be doing and instead take the time to explore something new, interesting, and fun. In many ways, that is where the fun of science, of genetics, and of discovery comes from.
|
label
|
4.
|
p
|
Keep learning. There are so many ideas out there. Perhaps the truth in this statement is most obvious when attending The Society’s annual meetings. It is really a privilege to come to this meeting and to see so many developments in human genetics. I really find it surprising that, with all the great science being presented at this year’s meeting, I would have been lucky enough to be selected for the 2014 Curt Stern Award.
|
label
|
5.
|
p
|
The most valuable tools and algorithms are often extremely simple. It’s sometimes tempting to think that great discoveries are complicated. I have learned over time that sometimes our most valuable discoveries are very simple. They just need to be applied and tried at the right time.
|
p
|
Looking ahead, it seems clear that we’ll sequence thousands of genomes and that we’ll discover better computational methods and strategies for managing, analyzing, and interpreting the resulting data. Still, some of the biggest challenges in our field are not about the scale of experiments or computational efficiency. In my view, there remain big challenges in how we enable different scientists to interact with data, understand it, choose powerful study designs, and answer important questions. As the field of human genetics and its applications get more complex and more diverse, I feel that we need to spend quite a bit of time facilitating communication of our findings to experts in other disciplines and across the different subfields in our society.
|
p
|
To conclude, I wanted to take a moment to thank my family. The past 10 years have also been a very interesting time for them. A little over 10 years ago, I married Cristen Willer, whom I met during my doctoral studies at Oxford. Cristen has been wonderfully encouraging, supportive, and patient. Those of you who know her will know that she is also an accomplished geneticist and member of The Society. We now have four kids, and another one is due in March. They are a little smaller than Mark’s, so they’re probably not as much of a handful, but they really make my days at the office worthwhile. I encourage anybody who is thinking about a career in genetics not to be concerned that it gets in the way of family life. Don’t trade. Do both.
|
p
|
Thank you all very much.
|
back
|
Acknowledgments
I am deeply grateful to the members of The Society for this prestigious award. Major support for my research has come from the University of Michigan, the NIH, the Pew Charitable Trusts, GlaxoSmithKline, and the Foundation for the NIH. I would also like to thank my assistants, Laura Baker and Irene Felicetti, for their valuable help over the years and with this presentation. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
|
ack
|
Acknowledgments
I am deeply grateful to the members of The Society for this prestigious award. Major support for my research has come from the University of Michigan, the NIH, the Pew Charitable Trusts, GlaxoSmithKline, and the Foundation for the NIH. I would also like to thank my assistants, Laura Baker and Irene Felicetti, for their valuable help over the years and with this presentation.
|
title
|
Acknowledgments
|
p
|
I am deeply grateful to the members of The Society for this prestigious award. Major support for my research has come from the University of Michigan, the NIH, the Pew Charitable Trusts, GlaxoSmithKline, and the Foundation for the NIH. I would also like to thank my assistants, Laura Baker and Irene Felicetti, for their valuable help over the years and with this presentation.
|
footnote
|
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
|
p
|
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
|