MDC-Net: Multimodal Detection And Captioning Network For Steel Surface Defects

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


In the highly competitive steel sector, product quality, particularly in terms of surface integrity, is critical. Surface defect detection (SDD) is essential in maintaining high production standards, as it directly impacts product quality and manufacturing efficiency. Traditional SDD approaches, which rely primarily on manual inspection or traditional computer vision techniques, are plagued with difficulties, including reduced accuracy and potential health concerns to inspectors. This research describes an innovative solution that uses a sequence generation model with transformers to improve the defect detection process while manufacturing hot-rolled steel sheets and generating captions about the defect and its spatial location. This method, which views object detection as a sequence generation problem, allows for a more sophisticated understanding of image content and a complete and contextually rich investigation of surface defects whilst providing captions. While this method can potentially improve detection accuracy, its actual power rests in its scalability and flexibility to various industrial applications. Furthermore, this technique has the potential to be further enhanced for visual question-answering applications, opening up opportunities for interactive and intelligent image analysis.
Original languageEnglish
Title of host publicationRobotics, Computer Vision and Intelligent Systems
Subtitle of host publication4th International Conference, ROBOVIS 2024, Rome, Italy, February 25–27, 2024, Proceedings
EditorsJoaquim Filipe, Juha Röning
Place of PublicationCham, Switzerland
PublisherSpringer Nature
ISBN (Electronic)9783031590573
ISBN (Print)9783031590566
Publication statusAccepted/In press - 29 Jan 2024
EventRobotics, Computer Vision and Intelligent Systems 2024 - Rome, Italy
Duration: 25 Feb 202427 Feb 2024

Publication series

NameCommunications in Computer and Information Science
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937


ConferenceRobotics, Computer Vision and Intelligent Systems 2024
Abbreviated titleROBOVIS

Cite this