We estimated the overall quality of the HPO annotations by inspecting the automatically extracted annotations for a set of 41 common diseases randomly chosen from 13 upper-level DO45 categories that had a MeSH disease identifier and thus could be analyzed analogously to the common MeSH diseases. The process involved manually validating of all HPO annotations extracted by the CR process and comparing them to the results of detailed manual curation for the estimation of the true- and false-positive and the false-negative rates. We note that it is not informative to calculate a true-negative rate across the entire HPO because even if the CR process flags several hundred terms, the great majority of the over 10,000 HPO terms will be true negatives. We found that maximizing the overall F-score (i.e., the harmonic mean of precision and recall) led to a mean F-score of 45.1% (i.e., a mean precision of around 60% accompanied by a mean recall of around 40%). In separate experiments, we found that a CR run with parameters designed to maximize the precision in each of the 13 categories achieved a mean precision of 66.8% (data not shown). However, we chose to use the annotations derived from the F-score procedure for the remainder of the analysis. The complete set of annotations associated with the 41 common diseases, including flags for true positives, false positives, and false negatives, can be found in Tables S1–S41.