|Title of host publication||Human-Computer Interaction. Human Values and Quality of Life - Thematic Area, HCI 2020, Held as Part of the 22nd International Conference, HCII 2020, Proceedings|
|Number of pages||18|
|Publication status||Published - 19 Jul 2020|
|Event||22nd International Conference on Human Computer Interaction - Denmark (now virtual), Denmark|
Duration: 19 Jul 2020 → 24 Jul 2020
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||22nd International Conference on Human Computer Interaction|
|Abbreviated title||HCI International 2020|
|Period||19/07/20 → 24/07/20|
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
This study expands upon the existing literature by applying a multidisciplinary, experimental approach and empirical analysis using real healthcare data , to address three key questions:
1. How is user understanding related to user trust of an XAI system? 2. How does the type of algorithm visualisation affect users’ perceived understanding and/or trust? 3. Do users make accurate decisions based upon an XAI system, or do they show any biases?
Combining computer science and psychological approaches, this study investigates three Machine Learning (ML) algorithms differing in explainabililty : Decision Trees (DT), Logistic Regression (LR), and Neural Networks (NN). During the study, 70 international participants, age 18-65 years, were presented with biopsy results (predicted by the ML models mentioned above) for three hospital patients being tested for the presence of breast cancer. The results for each patient were presented using a different XAI-driven visualisation of each model (DT, LR or NN). Each participant was presented with all three visualisations but the order of presentation was randomised to prevent order effects.
Fig 1. Example of Logistic Regression Algorithm Output Fig 2. Example of Neural Network Algorithm Output
Fig 3. Example of Decision Tree Algorithm Output
In addition to the visual output, participants were presented with the suggested diagnosis, e.g., “The tool suggests the following diagnosis: [Benign, i.e., non-cancerous or Malignant, i.e., cancerous]”. The diagnosis provided for each algorithm was counterbalanced. This was to control for any potential influence of the nature of diagnosis (i.e., malignant or benign) upon users’ understanding and/or trust of the system. Counterbalancing means that – for each algorithm condition (DT, LR or NN) – half of the participants received a malignant result, and the other half received a benign result; enabling the researchers to identify any biases driven by the nature of diagnosis.
For each of the three algorithm visualisations, participants were asked to rate the degree to which they trusted the accuracy of the diagnosis, and the degree to which they understood how the system had arrived at this diagnosis.
We highlight three key findings:
1. Level of user understanding was not significantly different between the algorithm visualisations – contradictory to previous non-experimental research which suggests that more accurate (but typically more ‘black box’ approaches such as NN) would be less understood.
2. Despite no significant difference in user understanding, user trust between the different algorithm visualisations was significantly different. This suggests that understanding and explainability are not the only factors contributing to user trust in AI; and raises concerns over research and design focusing solely upon explainability.
3. Users display biases in trust – and to a lesser degree, understanding – depending upon the nature of the XAI diagnosis (benign or malignant). Specifically, participants show a negative bias, i.e., a bias towards malignant results. This raises important issues around how humans can be encouraged to make more accurate judgements of XAI outcomes. Whilst trust is important when designing XAIbased systems for healthcare, accuracy and performance must also be taken into account – particularly if users do not appear to be reaching their decisions based upon deliberated assessment and explicit understanding.
We discuss possible reasons behind our findings, including users’ reliance upon mental heuristics (‘rules of thumb’) and the potential impact of context upon negativity bias (e.g., the ‘symmetry rule’ which suggest that presence of a symptom, i.e., a lump, may lead to illness beliefs ). We discuss the implications for XAI design, ethics, healthcare and future research. It is critical that design facilitates an appropriate level of trust that is reflective of the system capabilities, limitations and accuracy. Design must avoid promoting blind trust and over-reliance .
To summarise, the study contributions are threefold: Firstly, we believe that this is the first paper to investigate both user understandability and trust in relation to three different XAI approaches. Secondly, we use real-world healthcare data to provide empirical testing of the algorithms. Thirdly, we show how user biases may influence perceptions of understanding and/or trust in an XAI system.