Background: Self-report measures of alcohol problems are commonly included in studies evaluating treatment and recovery from alcohol use disorder (AUD), but no prior study has examined the replicability of the measurement of alcohol problems across studies with various measures and diverse samples. Further, it is unclear which items may be better indicators of alcohol problems for patient subgroups. In the present study, we integrated data from four large alcohol treatment studies to develop a commensurate measure of alcohol problems using moderated nonlinear factor analysis (MNLFA). Methods: Data were from the COMBINE study, Project MATCH, the Relapse Replication and Extension Project (RREP), and the United Kingdom Alcohol Treatment Trial (UKATT), yielding a total sample size of 4414. MNLFA was carried out on the Drinker Inventory of Consequences (COMBINE, MATCH, RREP) and Alcohol Problems Questionnaire (UKATT). Results: We successfully created a 78-item commensurate measure of alcohol problems and examined differential item functioning (DIF) by study membership, time, and socio-demographic characteristics. Sixty-two items demonstrated intercept DIF, suggesting differences in rates of item endorsement for clients with the same underlying levels of alcohol problems across patient subgroups. Six items demonstrated loading DIF, suggesting differences in the extent to which the items were indicative of alcohol problems across patient subgroups. Conclusions: The self-reported measurement of alcohol problems replicates across measures and diverse samples. Items with DIF have clinical implications for the treatment of AUD. Finally, MNLFA scores can be used to test substantive research questions across these studies.