Abstract
With the steady advancement of data collection technologies, practical users of statistical methods increasingly turn to the mixture of factor analysers (MFA) for model-based clustering and dimensionality reduction. However, as the number of measurements grows, so does the likelihood of missing data and outliers, which can lead to biased parameter estimates, reduced stability and robustness, and ultimately inaccurate inferences. This paper presents a new variant of MFA model that can accommodate missing data and mild outliers. The main assumption of the proposed model is that the latent factors and idiosyncratic errors follow jointly a contaminated-normal distribution, which incorporates parameters for automatic outlier detection. We develop the ECM and AECM algorithms to compute maximum likelihood parameter estimates. Asymptotic standard errors of parameters are derived by offering an information-based approach. Several simulation experiments are conducted to examine the asymptotic properties of the ML estimators and assess the model’s ability to mitigate the influence of missing data and outliers. We further illustrate the model’s practical applicability in social data analysis and image reconstruction, using cost-of-living data and the Barbara image as case studies. Software implementing the presented methodology is available at https://github.com/leila-shahriari/CNMFA-Model.
| Original language | English |
|---|---|
| Number of pages | 33 |
| Journal | Advances in Data Analysis and Classification |
| Early online date | 1 Nov 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 1 Nov 2025 |
Keywords
- Contaminated-normal distribution
- Dimension reduction
- Factor analysis
- Heavy-tailed mixtures
- Missing at random