Data-driven prediction of child neglect and abuse using integrated municipal sources

14.06.2026 | Parush Shear Yashuv N, Salem R, Abramson O, Carmeli-Messinger A, Neumbourg A, Ron A, Holzman N, Akiva P, Reis BY, Dadia-Molad M, Sobol A, Bivas-Benita M, Amit G

Abstract

Background: Child neglect and abuse are prevalent worldwide yet often incompletely reported and are frequently associated with long-term adverse physical and mental health outcomes. Municipal-level administrative data contain indicators relevant to detecting child neglect and abuse, which machine learning algorithms can aggregate to help identify children at-risk and facilitate timely interventions. However, this valuable information is typically stored in isolated data silos across different municipal services, limiting its effective utilization.

Objective: This study aimed to assess whether machine learning models applied to integrated municipal data can accurately predict the risk of child neglect and abuse in a large population of children residing in Jerusalem, Israel.

Participants and setting: A large, deidentified dataset representing over 470,000 children, linked across multiple municipal systems, including population registry, education, public health, local taxation and welfare services.

Methods: We defined neglect and abuse outcomes based on the child's welfare records, and constructed models to predict the current risk and the future 2-year risk for each outcome, using multitude of variables extracted from the dataset. Two main use cases were addressed: (1) risk prediction in the general child population using non-welfare data, and (2) risk prediction within the subpopulation already known to welfare services using both welfare and non-welfare data. The models were trained with incremental inclusion of data sources, and their performance was evaluated using the area under the receiver operating characteristic curve (AUC) and sensitivity at fixed levels of specificity.

Results: The prediction models demonstrated good performance, with AUCs ranging from 0.75 to 0.88, depending on the use case and the time window for risk estimation. Accuracy improved with the integration of additional data sources, particularly education and taxation records. In a scenario where the top 5 % of children at risk, according to the algorithm, are assessed by municipal services, 32 % of neglected children and 34 % of abused children would have been identified up to 2 years in advance. Predictive performance was generally consistent across sex groups, but showed slightly lower AUCs for Arab children, compared to Jewish children.

Conclusions: Machine learning models utilizing multi-source municipal data can effectively identify children at risk of maltreatment. Such tools may support municipal welfare systems by enhancing early detection, guiding resource allocation, and improving outcomes for vulnerable children. However, ethical considerations, cultural sensitivity, and human oversight are essential to ensure responsible implementation.

Child Abuse Negl. 2026 Feb;172:107872

To View Full Article