TY - JOUR
T1 - Consequences of arbitrary binning the midpoint category in survey data: an illustration with student satisfaction in the National Student Survey
AU - Pollet, Thomas V.
AU - Bilalic, Merim
AU - Shepherd, Lee
PY - 2024/11/1
Y1 - 2024/11/1
N2 - Arbitrary placing cut-offs in data, i.e. binning, is recognised as poor statistical practice. We explore the consequences of using arbitrary cut-offs in two large datasets, the National Student Survey (2019 and 2022). These are nationwide surveys aimed at capturing student satisfaction amongst UK undergraduates. For these survey data, it is common to group the responses to the question on student satisfaction on a five point Likert scale into ‘% satisfied’ based on two categories. These % satisfied are then further used in metrics. We examine the consequences of using three rather than two categories for the rankings of courses and institutions, as well as the consequences of excluding the midpoint from the calculations. Across all courses, grouping the midpoint with satisfied leads to a median shift of 8.40% and 11.41% in satisfaction for 2019 and 2022, respectively. Excluding the midpoint from the calculations leads to a median shift of 4.20% and 5.70% in satisfaction for 2019 and 2022, respectively. While the overall stability of the rankings is largely preserved, individual courses or institutions exhibit sizeable shifts. Depending on the analysis, the most extreme shifts for courses in rankings are between 13 and 79 ranks, for institutions between 24 and 416 ranks. Our analysis thus illustrates the potentially profound consequences of arbitrarily grouping categories for individual institutions and courses. We offer some recommendations on how this issue can be addressed but primarily we caution against the reliance on arbitrary grouping of response categories in survey data such as the NSS.
AB - Arbitrary placing cut-offs in data, i.e. binning, is recognised as poor statistical practice. We explore the consequences of using arbitrary cut-offs in two large datasets, the National Student Survey (2019 and 2022). These are nationwide surveys aimed at capturing student satisfaction amongst UK undergraduates. For these survey data, it is common to group the responses to the question on student satisfaction on a five point Likert scale into ‘% satisfied’ based on two categories. These % satisfied are then further used in metrics. We examine the consequences of using three rather than two categories for the rankings of courses and institutions, as well as the consequences of excluding the midpoint from the calculations. Across all courses, grouping the midpoint with satisfied leads to a median shift of 8.40% and 11.41% in satisfaction for 2019 and 2022, respectively. Excluding the midpoint from the calculations leads to a median shift of 4.20% and 5.70% in satisfaction for 2019 and 2022, respectively. While the overall stability of the rankings is largely preserved, individual courses or institutions exhibit sizeable shifts. Depending on the analysis, the most extreme shifts for courses in rankings are between 13 and 79 ranks, for institutions between 24 and 416 ranks. Our analysis thus illustrates the potentially profound consequences of arbitrarily grouping categories for individual institutions and courses. We offer some recommendations on how this issue can be addressed but primarily we caution against the reliance on arbitrary grouping of response categories in survey data such as the NSS.
KW - Survey methodology
KW - binning
KW - metrics
KW - rankings
KW - student satisfaction
UR - http://www.scopus.com/inward/record.url?scp=85178049126&partnerID=8YFLogxK
U2 - 10.1080/03075079.2023.2284808
DO - 10.1080/03075079.2023.2284808
M3 - Article
SN - 0307-5079
VL - 49
SP - 1945
EP - 1964
JO - Studies in Higher Education
JF - Studies in Higher Education
IS - 11
ER -