Background Reviewers and users of reviews draw conclusions about the overall quality of the evidence that is reviewed. Similarly, people making recommendations and users of those recommendations draw conclusions about the strength of the recommendations that are made. Systematic approaches to doing this can help protect against errors by both doers and users, and can facilitate critical appraisal and communication of the conclusions that are made. The GRADE Working Group began as an informal collaboration of people with an interest in addressing shortcomings in systems for grading evidence and recommendations. We report elsewhere a critical appraisal of six prominent systems for grading evidence and recommendations [1]. Based on this critical appraisal and a series of discussions, we reached agreement on the key attributes of a system that would address the major shortcomings that we identified. Based on the critical assessment of existing approaches, the agreement we had reached about the key elements that should be included in an approach for grading the level of evidence and strength of recommendations and our previous experiences we put together a suggestion for a grading system. We then applied the suggested system to a series of examples and discussed and revised the system based on this experience and the consideration of other examples. Examples were selected to challenge our thinking. All of the examples used in this pilot study were questions about interventions. We describe here the pilot study of this system. The aims of the pilot study were to test whether the approach is sensible relative to diverse examples of evidence and recommendations, and to agree on necessary changes to the approach, decision rules, and changes in how the evidence profiles used in the pilot study were constructed. The revised approach is described elsewhere [16].