Evaluating (semi)-autonomous systems in policing and national security: A new framework based on the concept of ‘intelligence’: A new matrix framework of evaluation and grading, based on lessons from existing processes designed to define and assess ‘intelligence’

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Artificial intelligence, semi-autonomous systems and algorithmic models (together referred to as ‘models’) within policing and national security are being used to triage, prioritise, predict and manage data overload, and to inform some of the most important decisions within our society, including whether someone is a victim of modern slavery, a child at risk of harm or a potential perpetrator of terrorism. Such a determination can have a significant impact on that person's future, their progress through the criminal justice system, and the deployment of limited resources. There is an urgent stakeholder need in policing and national security for legitimacy and earned trust when technology is used to make decisions. It is crucial that models safeguard rather than undermine fundamental freedoms. Outputs of these models may be one factor deployed to satisfy a legal test (such as having ‘reasonable grounds’ for suspicion). Therefore, policing and national security bodies need expert decision support frameworks to determine whether they should rely on a model to help them make an operational decision that may impact upon individual rights. Such methods must facilitate evaluation and critique by subject matter experts, with a model's errors and uncertainties highlighted. These methods must be context-aware and reflective of concepts that make sense to practitioners; yet satisfactory frameworks do not yet exist.

This paper outlines our new matrix framework of evaluation and grading, based on lessons from existing processes designed to define and assess ‘intelligence’, a concept with no formal legal definition in England and Wales.

In the context of national security and policing, ‘intelligence’ is both a noun and a verb. It refers to the agencies and organisations that protect national security and fight organized crime, and to the people that work within those agencies. It means stealing secrets. It describes collecting information from covert human intelligence sources or ‘agents’, and from technological monitoring and surveillance including from open sources. It could describe a procedural stage, not yet formally defined as a criminal investigation, conducted by a law enforcement or intelligence agency. And it also means the process of combining intelligence with other information, and analyzing and assessing it, and the product of such analysis. But whatever it is, intelligence is inherently uncertain and subjective. Outputs of models share the same qualities of uncertainty and subjectiveness as ‘intelligence’. Yet such models can provide new and valuable insights, just as intelligence can. The national security and policing communities are familiar with processes designed to assess the reliability and certainty of intelligence, when and how it should be used in decision-making, when it should be ignored, and therefore whether its use is likely to be fair and proportionate.

The methodology used for this study combines an empirical component with a literature-based component. The existing literature on areas such as traditional intelligence-led policing, the varying definitions of intelligence, and the implications of the use of AI systems in law enforcement. The key literature on the use of intelligence in the UK is published by the College of Policing, through their ‘authorised professional practice’ guidelines for police officers. Key concepts from these guidelines include the intelligence grading matrix, and intelligence probability yardstick, which are used to assess intelligence that may be used for policing decisions to be made. Our draft matrix incorporated elements of the intelligence assessment matrices, as well as the key factors provided by legal experts in the field of automated decision making.

The empirical aspect of this study involved research conducted with police forces and other criminal justice stakeholders. In order to set up meetings, in-person research visits, and workshops where we could get input from seasoned officers on our draft matrix, pertinent police forces were contacted. There has been a keen interest from police forces, as well as key stakeholders from the Police Digital Service (PDS), Centre for Data and Analytics (CDAP), and the National Police Chiefs’ Council (NPCC). This process of consulting with the criminal justice stakeholders has also allowed us to modify the matrix accordingly. The project has received internal ‘seed’ funding from Northumbria University to conduct the empirical activities required for the study.

To improve the effectiveness of governance and oversight, we argue that similar processes should be developed and applied to the use of models, in policing, national security and more widely, with the output of those models categorised as ‘intelligence’ rather than combining such outputs with unfitting definitions of special category ‘personal data’. Our interactive poster presentation will display our proposed matrix framework for the evaluation of models used in national security and policing. It will explain the proposed factors incorporated within the matrix and how our research has concluded upon these. It will invite conference participants to consider a fictional case-study based on a model likely to be used in a policing or national security, evaluate and grade that model using the matrix approach, and provide feedback on the matrix's effectiveness as a method of evaluating reliability and legitimacy of a model.

When taking into consideration which factors from similar contemporary matrices focused on HUMINT must be taken into account, we found that factors relating to source evaluation translated well to algorithmic models and many of these were suitable for inclusion in an algorithm-specific intelligence matrix. Handling codes and conditions for intelligence dissemination likewise translated very well and the matrix was developed with this in mind. Though information quality assessment was also important, we found that fewer factors were directly transferable. Feedback from policing and law enforcement stakeholders, including organised crime units and national intelligence agencies, indicates an emphasis on matrices being simple enough for non-specialist frontline officers to get to grips with quickly. This was a major influence on the design of matrix factors.

The matrix considers factors relating to ‘Data Inputs’, ‘Output/Analysis’, and ‘Additional Factors’, and these are based on factors that are either a key component of computer science interpretability and testing theory or which relate to the causes of past algorithmic policing errors from negative use cases. Examples include bias testing, the extent of data cleaning undertaken, whether the tool was specifically designed for the task at hand, and whether or not officers have been trained on both general algorithmic reliability theory and on the specific tool which is producing the information which is being fed into the intelligence matrix.

A points system which awards points between 10 and 60 for positive or negative factors allows law enforcement to assess algorithmically produced information for its intelligence value quickly, and also acts as a soft influencer for policing and intelligence procurement in that it influences the types of algorithmic software that police forces will procure externally or develop internally.

Computer science considerations with a binary consideration (such as whether or not the testing report for the tool is publicly available) translated well for use in a matrix, but the often contextual nature of data in real-world settings means that many factors have different point values for different extents to which the statement is true. This balanced the need for simplicity with the need for the matrix to work as intended, and so different ‘tiers’ of the matrix award different points for the same factor depending to the extent that factor is true or false.

Once the matrix is applied to either an algorithmic tool or the information it has produced, it enables the user to arrive at a numerical integer which they can compare with a set of integer ranges to easily assess whether the algorithmic tool can be generally trusted to produce results that are reliable and/or valuable. Though no matrix evaluation can ever be perfect, it provides a good starting point for non-expert end users of algorithmic information and intelligence - enabling potentially flawed algorithmic outputs to be discarded early. It is anticipated that these factors could be applied to algorithmic outputs such as those being trialled within the United Kingdom in traffic cameras to detect vehicle and driver safety violations on public roads, as well as with more complicated algorithmic use cases such as facial recognition, anti-fraud, or anti-organised-crime algorithmic tools.
Original languageEnglish
Title of host publicationTAS '23
Subtitle of host publicationProceedings of the First International Symposium on Trustworthy Autonomous Systems
Place of PublicationNew York, US
Number of pages2
ISBN (Electronic)9798400707346
ISBN (Print)9798400707346
Publication statusPublished - 11 Jul 2023
EventThe First International Symposium on Trustworthy Autonomous Systems 2023 (TAS '23) - Edinburgh, United Kingdom
Duration: 11 Jul 202312 Jul 2023

Publication series

NameProceedings of the First International Symposium on Trustworthy Autonomous Systems


ConferenceThe First International Symposium on Trustworthy Autonomous Systems 2023 (TAS '23)
Abbreviated titleTAS '23
Country/TerritoryUnited Kingdom
Internet address

Cite this