Abstract
In recent years, social media platforms such as Twitter have become a vital source of information for detecting mental health issues, especially depression. Despite increasing efforts using traditional domain features and machine learning classifiers, the research gap remains when introducing general large-scale language models to the specific domain task. This thesis is dedicated to developing a novel approach for depression detection from tweets using a transformer-based language model. By leveraging the rich semantic understanding of the pre-trained model, this thesis presents an effective method for identifying and analysing depressive language patterns.Our contributions focus on addressing three key research challenges. First, we examine different language models on a public dataset of tweets labelled as depressed or nondepressed, enabling it to extract linguistic indicators of depression. Next, we apply multi-instance learning (MIL), which allows the model to capture the complex, multi-faceted nature of mental health and assess the presence of depressive symptoms through annotations of a few thousand Twitter users, rather than millions of tweet instances.
The third contribution focuses on addressing the prior assumption of multi-instance learning in the context of depression detection. Unlike existing transformer-based language models that achieve inference via pure data-driven approaches, a prior filter equation is proposed to analyse and infer each tweet via historical tweet references from the training gallery data. The diagnosis made by the prior filter can have better interpretability than that of the black box of language models. Each test tweet user for diagnosis can get a depressive factor analysis which links to a full reference list of people and tweets that showed similar symptoms in the training set.
Our results demonstrate the efficacy of this approach in detecting depression with high accuracy and minimal false positives. Furthermore, the proposed method outperforms existing techniques, showcasing the potential of transformer-based language models and multi-instance learning in addressing mental health challenges. The proposed prior filter equation improves the overall performance over the BERT with MIL baseline by 10.5% in Precision, 21.7% in F1-score, and 10.9% in Accuracy. This research contributes to improving mental health monitoring through social media and offers valuable insights for the future development of automated depression detection systems.
Date of Award | 25 Jul 2024 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Honglei Li (Supervisor) |
Keywords
- Digital Health
- Sentiment Analysis
- Mental Wellbeing
- Text Generation
- Social Media Data Analysis