SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts

Ruben Krupier*, Ioannis Konstas, Alasdair Gray, Farhad Sadeghineko, Richard Watson, Bimal Kumar

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)
14 Downloads (Pure)


Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to re-search that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPAR.txt1, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) de-fined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).
Original languageEnglish
Title of host publicationProceedings of the Natural Legal Language Processing Workshop 2021
EditorsNikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro
Place of PublicationPunta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics (ACL)
Publication statusPublished - 10 Nov 2021
EventNatural Legal Language Processing Workshop 2021 - Punta Cana, Dominican Republic
Duration: 10 Nov 2021 → …


WorkshopNatural Legal Language Processing Workshop 2021
Abbreviated titleNLLP Workshop 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period10/11/21 → …
Internet address


Dive into the research topics of 'SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts'. Together they form a unique fingerprint.

Cite this