Abstract
Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to re-search that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPAR.txt1, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) de-fined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).
Original language | English |
---|---|
Title of host publication | Proceedings of the Natural Legal Language Processing Workshop 2021 |
Editors | Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro |
Place of Publication | Punta Cana, Dominican Republic |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 129-143 |
Publication status | Published - 10 Nov 2021 |
Event | Natural Legal Language Processing Workshop 2021 - Punta Cana, Dominican Republic Duration: 10 Nov 2021 → … https://nllpw.org/workshop/ |
Workshop
Workshop | Natural Legal Language Processing Workshop 2021 |
---|---|
Abbreviated title | NLLP Workshop 2021 |
Country/Territory | Dominican Republic |
City | Punta Cana |
Period | 10/11/21 → … |
Internet address |