Author: Kentaro Toyama
In 2008, in what was among the earliest, most prominent applications of AI to social good, Google released “Google Flu Trends,” an online system that claimed to provide early warnings of flu outbreaks in various geographic regions, based on the incidence of Google searches related to the flu. Its creators claimed that the system generated output that was spectacularly correlated (coefficients of 0.90 and higher) with the surveillance data of the U.S. Centers for Disease Control and Prevention (CDC), but 1-2 weeks earlier. The system was seminal, inspiring efforts to track other diseases, to use other types of data, and to provide disease surveillance to countries with weaker public health systems. Google Flu Trends offered the possibility of free, rapid, effortless, disease surveillance. Critics, however, found errors: Google Flu Trends “completely [missed] the first wave of the 2009 influenza A/H1N1 pandemic, and greatly [overestimated] the intensity of the A/H3N2 epidemic during the 2012-2013 season”; the latter predictions were off by a factor of 2 or worse. A range of causes were suggested: overfitting of data; spurious correlations with irrelevant search terms; changing user search habits; media-influenced search behavior; differences between suspected and actual illness; and so on. Some researchers have proposed that the Google Flu Trends algorithm could be improved upon, or that it needs to be routinely updated, and they are undoubtedly right – the original algorithm was based on simple linear regressions trained on a static data set. But, even with that new wisdom, the world has yet to see a “big data” illness prediction system that improves upon reliable surveillance systems of public health organizations such as the CDC, and that, too, despite a decade of dramatic advances in machine learning, the rise of passionate AI proponents seeking social impact, and a global COVID pandemic stressing the need for better outbreak prediction.