How is data science helping Flint, Michigan recover from its water crisis? A Data-driven approach to infrastructure improvement
Abstract
The Flint water crisis highlights a number of serious problems: a public health outbreak, inadequate urban infrastructure, operational failures, political mistrust, and environmental injustice. But a key challenge that has received less attention in Flint’s recovery is a lack of information: Who is most at risk? What predicts that risk? Where should resources be allocated? These problems bear a surprising similarity to those in management science and customer analytics. Households differ in their lead contamination, but without testing every home’s water, how can we predict which homes are at greatest risk? Further, city officials face a dynamic resource allocation problem: given uncertain records and costly construction, which homes’ pipes should be replaced next? Support for recovery continues, as U.S. Congress appropriated $100 million for Flint out of $9 billion for nationwide water infrastructure, but the efficient use of these funds is critical. To contribute to the recovery efforts, we assembled rich datasets, including thousands of water samples, information on pipe materials, and city records. Working with local government, we have been able to more accurately estimate the greatest risks, to develop a clearer picture of the source of the problems, and to more efficiently direct resources towards recovery. Specifically, we employ ensembles of classification algorithms and active learning, and we developed apps for coordination with contractors and residents. We illustrate our approach, involving statistical machine learning tools and data collection efforts, as a replicable method for other cities to follow. We contend that Flint, Michigan can serve as a national model for how to improve water infrastructure with a data-driven approach.