Solar flares produce radiation which can have an almost immediate effect on the near-Earth environment, making it crucial to forecast flares in order to mitigate their negative effects. The number of published approaches to flare forecasting using photospheric magnetic field observations has proliferated, with varying claims about how well each works. Because of the different analysis techniques and data sets used, it is essentially impossible to compare the results from the literature. This problem is exacerbated by the low event rates of large solar flares. The challenges of forecasting rare events have long been recognized in the meteorology community, but have yet to be fully acknowledged by the space weather community. During the interagency workshop on “all clear” forecasts held in Boulder, CO in 2009, the performance of a number of existing algorithms was compared on common data sets, specifically line-of-sight magnetic field and continuum intensity images from MDI, with consistent definitions of what constitutes an event. We demonstrate the importance of making such systematic comparisons, and of using standard verification statistics to determine what constitutes a good prediction scheme. When a comparison was made in this fashion, no one method clearly outperformed all others, which may in part be due to the strong correlations among the parameters used by different methods to characterize an active region. For M-class flares and above, the set of methods tends towards a weakly positive skill score (as measured with several distinct metrics), with no participating method proving substantially better than climatological forecasts.