Occupational accidents are an urgent problem in construction. Machine learning (ML) methods for analyzing large amounts of data and the availability of accident report data have generated aspirations for novel learnings. Yet the quality of data in terms of input, inner availability, and output occurs as an issue in many ML development projects. This paper aims at investigating strategies to define, understand, and tackle poor data quality in a contracting company's accident reports. A selective literature review within software system data quality and ML shows different foci on external or internal data. A set of records of occupational accidents are then analyzed. There are many missing entries on causality, as well as shallow descriptions, which hinder the discovery of new risks - possibly due to the data collection format and procedures. The low number of full entries calls for new repair strategies - both externally and internally. © ASCE 2023.All rights reserved.