User Trust and Understanding of Explainable AI: Exploring Algorithm Visualisations and User Biases

Dawn Branley-Bell, Rebecca Whitworth, Lynne Coventry

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Citations (Scopus)

Abstract

Artificial intelligence (AI) has the potential to increase productivity and relieve workload on staff in highpressure jobs such as healthcare [1]; where staff report high levels of exhaustion detrimental to their own wellbeing and patient safety [2]. AI based healthcare tools have generally failed, and research suggests that a lack of HCI considerations are a key reason for failure [1,3]. For AI to be effective, it is vital that users can understand how the system is processing data. Explainable AI (XAI) moves away from the traditional ‘black box’ AI approach, aiming to make the processes behind the system more transparent [4]. To help achieve this it uses algorithm visualisations. However, current work tends to focus on explainability and overlook a key psychological factor - user trust [5]. Transparency is vital to trust formation, therefore XAI has the potential to significantly impact on human-computer synergy [6]. However, it is also vital that we establish whether users are making accurate choices when their decisions are aided by XAI, as inaccurate decisions could be detrimental to patient wellbeing.
This study expands upon the existing literature by applying a multidisciplinary, experimental approach and empirical analysis using real healthcare data [7], to address three key questions:
1. How is user understanding related to user trust of an XAI system? 2. How does the type of algorithm visualisation affect users’ perceived understanding and/or trust? 3. Do users make accurate decisions based upon an XAI system, or do they show any biases?
Combining computer science and psychological approaches, this study investigates three Machine Learning (ML) algorithms differing in explainabililty [8]: Decision Trees (DT), Logistic Regression (LR), and Neural Networks (NN). During the study, 70 international participants, age 18-65 years, were presented with biopsy results (predicted by the ML models mentioned above) for three hospital patients being tested for the presence of breast cancer. The results for each patient were presented using a different XAI-driven visualisation of each model (DT, LR or NN). Each participant was presented with all three visualisations but the order of presentation was randomised to prevent order effects.
Fig 1. Example of Logistic Regression Algorithm Output Fig 2. Example of Neural Network Algorithm Output


Fig 3. Example of Decision Tree Algorithm Output
In addition to the visual output, participants were presented with the suggested diagnosis, e.g., “The tool suggests the following diagnosis: [Benign, i.e., non-cancerous or Malignant, i.e., cancerous]”. The diagnosis provided for each algorithm was counterbalanced. This was to control for any potential influence of the nature of diagnosis (i.e., malignant or benign) upon users’ understanding and/or trust of the system. Counterbalancing means that – for each algorithm condition (DT, LR or NN) – half of the participants received a malignant result, and the other half received a benign result; enabling the researchers to identify any biases driven by the nature of diagnosis.
For each of the three algorithm visualisations, participants were asked to rate the degree to which they trusted the accuracy of the diagnosis, and the degree to which they understood how the system had arrived at this diagnosis.
We highlight three key findings:
1. Level of user understanding was not significantly different between the algorithm visualisations – contradictory to previous non-experimental research which suggests that more accurate (but typically more ‘black box’ approaches such as NN) would be less understood.
2. Despite no significant difference in user understanding, user trust between the different algorithm visualisations was significantly different. This suggests that understanding and explainability are not the only factors contributing to user trust in AI; and raises concerns over research and design focusing solely upon explainability.
3. Users display biases in trust – and to a lesser degree, understanding – depending upon the nature of the XAI diagnosis (benign or malignant). Specifically, participants show a negative bias, i.e., a bias towards malignant results. This raises important issues around how humans can be encouraged to make more accurate judgements of XAI outcomes. Whilst trust is important when designing XAIbased systems for healthcare, accuracy and performance must also be taken into account – particularly if users do not appear to be reaching their decisions based upon deliberated assessment and explicit understanding.
We discuss possible reasons behind our findings, including users’ reliance upon mental heuristics (‘rules of thumb’) and the potential impact of context upon negativity bias (e.g., the ‘symmetry rule’ which suggest that presence of a symptom, i.e., a lump, may lead to illness beliefs [9]). We discuss the implications for XAI design, ethics, healthcare and future research. It is critical that design facilitates an appropriate level of trust that is reflective of the system capabilities, limitations and accuracy. Design must avoid promoting blind trust and over-reliance [10].
To summarise, the study contributions are threefold: Firstly, we believe that this is the first paper to investigate both user understandability and trust in relation to three different XAI approaches. Secondly, we use real-world healthcare data to provide empirical testing of the algorithms. Thirdly, we show how user biases may influence perceptions of understanding and/or trust in an XAI system.
Original languageEnglish
Title of host publicationHuman-Computer Interaction. Human Values and Quality of Life - Thematic Area, HCI 2020, Held as Part of the 22nd International Conference, HCII 2020, Proceedings
EditorsMasaaki Kurosu
PublisherSpringer
Pages382-399
Number of pages18
ISBN (Print)9783030490645
DOIs
Publication statusPublished - 19 Jul 2020
Event22nd International Conference on Human Computer Interaction - Denmark (now virtual), Denmark
Duration: 19 Jul 202024 Jul 2020
http://2020.hci.international/index.html

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12183 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Human Computer Interaction
Abbreviated titleHCI International 2020
Country/TerritoryDenmark
Period19/07/2024/07/20
Internet address

Keywords

  • Artificial intelligence
  • Cognitive biases
  • Explainable AI
  • Health
  • Healthcare
  • Machine Learning
  • Medical diagnoses
  • Trust
  • Understanding

Fingerprint

Dive into the research topics of 'User Trust and Understanding of Explainable AI: Exploring Algorithm Visualisations and User Biases'. Together they form a unique fingerprint.

Cite this