Abstract
Approaches to authorship attribution have traditionally been constrained by the size of the message to which they can be successfully applied, making them unsuitable for analysing shorter messages such as SMS Text Messages, micro-blogs (e.g. Twitter) or Instant Messaging. Having many potential authors of a number of texts (as in, for example, an online context) has also proved problematic for traditional descriptive methods, which have tended to be successfully applied in cases where there is a small and closed set of possible authors.
This paper reports the findings of a project which aimed to develop and automate techniques from forensic linguistics that have been successfully applied to the analysis of short message content in criminal cases. Using data drawn from UK-focused online groups within Twitter, the research extends the applicability of Grant’s (2007; 2010) stylistic and statistical techniques for the analysis of authorship of short texts into the online environment. Initial identification of distinctive textual features commonly found within short messages allows for the development of a taxonomy which can then be used when calculating the ‘distance’ between messages containing instances of these feature types. The end result is an automated process with a high level of success in assigning tweets to the correct author. The research has the potential to extend the scope of reliable and valid authorship analysis into hitherto unexplored contexts. Given the relative anonymity of the internet and the availability of cloaking technology, linguistic research of this nature represents a crucial contribution to the investigative toolkit.
This paper reports the findings of a project which aimed to develop and automate techniques from forensic linguistics that have been successfully applied to the analysis of short message content in criminal cases. Using data drawn from UK-focused online groups within Twitter, the research extends the applicability of Grant’s (2007; 2010) stylistic and statistical techniques for the analysis of authorship of short texts into the online environment. Initial identification of distinctive textual features commonly found within short messages allows for the development of a taxonomy which can then be used when calculating the ‘distance’ between messages containing instances of these feature types. The end result is an automated process with a high level of success in assigning tweets to the correct author. The research has the potential to extend the scope of reliable and valid authorship analysis into hitherto unexplored contexts. Given the relative anonymity of the internet and the availability of cloaking technology, linguistic research of this nature represents a crucial contribution to the investigative toolkit.
Original language | English |
---|---|
Title of host publication | Proceedings of the International Association of Forensic Linguists’ 10th Biennial Conference, Aston University, Birmingham, UK, July 2011 |
Editors | Samuel Tomblin, Nicci MacLeod, Rui Sousa-Silva, Malcolm Coulthard |
Place of Publication | Birmingham |
Publisher | Aston University |
Pages | 210-224 |
Number of pages | 25 |
ISBN (Electronic) | 9781854494320 |
Publication status | Published - 2012 |
Event | The International Association of Forensic Linguists Tenth Biennial Conference - Aston University, Birmingham, United Kingdom Duration: 11 Jul 2011 → 14 Jul 2011 https://www2.aston.ac.uk/lss/news/events/2010-11-archive/iafl2011 |
Conference
Conference | The International Association of Forensic Linguists Tenth Biennial Conference |
---|---|
Abbreviated title | IAFL10 |
Country/Territory | United Kingdom |
City | Birmingham |
Period | 11/07/11 → 14/07/11 |
Internet address |
Keywords
- authorship analysis
- stylistic methods
- statistical methods
- online messaging