TY - JOUR
T1 - How well are marginalised groups represented in electronic records? A codelist development project and cross-sectional analysis of UK electronic health records
AU - Perchyk, Tetyana
AU - De Vere Hunt, Isabella Joy
AU - Nicholson, Brian D.
AU - Mounce, Luke
AU - Sykes, Kate
AU - Lyratzopoulos, Georgios
AU - Lemanska, Agnieszka
AU - Whitaker, Katriina L.
AU - Kerrison, Robert S.
PY - 2025/8/11
Y1 - 2025/8/11
N2 - Objectives Primary care electronic health records provide a rich source of information for inequalities research. However, the reliability and validity of the research derived from these records depend on the completeness and resolution of the codelists (ie, collections of medical terms/codes) used to identify populations of interest. The aim of this project was to develop comprehensive codelists for identifying people from ethnic minority groups, people with learning disabilities (LDs), people with severe mental illness (SMI) and people who are transgender. Design We followed a three-stage process to define and extract relevant codelists. First, groups of interest were defined a priori. Next, relevant clinical codes, relating to the groups, were identified by searching Clinical Practice Research Datalink (CPRD) publications, codelist repositories and the CPRD Code Browser. Relevant codelists were extracted and merged according to group, and duplicates were removed. Finally, the remaining codes were reviewed by two general practitioners (GPs). Setting The curated codelists were compared using a representative sample in the UK. The frequencies of individuals identified using the curated codelists were assessed and compared with widely used alternative codelists. Participants Comprehensiveness was assessed in a representative CPRD population of 10 966 759 people. Results After removal of duplicates and GP review, codelists were finalised with 325 unique codes for ethnicity, 558 for LD, 499 for SMI and 38 for transgender. Compared with comparator codelists, an additional 48 017 (76.6%), 52 953 (68.9%) and 508 (36.9%) people with LD, SMI or transgender code were identified. The proportions identified for ethnicity, meanwhile, were consistent with expectations for the UK (eg, 6.50% Asian, 2.66% black and 1.44% mixed). Conclusions The curated codelists are more sensitive than those widely used in practice and research. Discrepancies between national estimates and primary care records suggest potential record/retention issues. Resolving these requires further investigation and could lead to improved data quality for research.
AB - Objectives Primary care electronic health records provide a rich source of information for inequalities research. However, the reliability and validity of the research derived from these records depend on the completeness and resolution of the codelists (ie, collections of medical terms/codes) used to identify populations of interest. The aim of this project was to develop comprehensive codelists for identifying people from ethnic minority groups, people with learning disabilities (LDs), people with severe mental illness (SMI) and people who are transgender. Design We followed a three-stage process to define and extract relevant codelists. First, groups of interest were defined a priori. Next, relevant clinical codes, relating to the groups, were identified by searching Clinical Practice Research Datalink (CPRD) publications, codelist repositories and the CPRD Code Browser. Relevant codelists were extracted and merged according to group, and duplicates were removed. Finally, the remaining codes were reviewed by two general practitioners (GPs). Setting The curated codelists were compared using a representative sample in the UK. The frequencies of individuals identified using the curated codelists were assessed and compared with widely used alternative codelists. Participants Comprehensiveness was assessed in a representative CPRD population of 10 966 759 people. Results After removal of duplicates and GP review, codelists were finalised with 325 unique codes for ethnicity, 558 for LD, 499 for SMI and 38 for transgender. Compared with comparator codelists, an additional 48 017 (76.6%), 52 953 (68.9%) and 508 (36.9%) people with LD, SMI or transgender code were identified. The proportions identified for ethnicity, meanwhile, were consistent with expectations for the UK (eg, 6.50% Asian, 2.66% black and 1.44% mixed). Conclusions The curated codelists are more sensitive than those widely used in practice and research. Discrepancies between national estimates and primary care records suggest potential record/retention issues. Resolving these requires further investigation and could lead to improved data quality for research.
KW - EPIDEMIOLOGY
KW - MENTAL HEALTH
KW - Primary Care
KW - Transgender Persons
UR - https://www.scopus.com/pages/publications/105012995672
U2 - 10.1136/bmjopen-2024-098305
DO - 10.1136/bmjopen-2024-098305
M3 - Article
AN - SCOPUS:105012995672
SN - 2044-6055
VL - 15
JO - BMJ Open
JF - BMJ Open
IS - 8
M1 - e098305
ER -