ML in Health: Allen alliance fights Corona
Since my last post on Health and (Federated) Machine Learning, the tech sector’s involvement has multiplied, almost day by day. MS Research & AI 2 (the Allen Institute for AI) partnered with leading research groups to prepare and distribute for free the COVID-19 Open Research Dataset (CORD-19). Mining useful insights from this vast-and-growing data is a goldmine for analytics, NLP, and Deep Learning.
The CORD-19 AI Challenge
Canadian BlueDot and a couple of similar AI-based warning systems showed, weeks ahead of the WHO, that public authorities worldwide were using outdated analytics tools in forecasting and warning. Hence the Tasks section of the CORD-19 dataset: “a call to action to the world’s artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers … …data mining approaches to find answers to questions within, and connect insights across, this content… …What do we know about: ”
- Virus Transmission, incubation, stability, seasonality, persistence, protective gear
- Medical Care, challenges, solutions, best practices, management, shortages, telemed, home care
- Risk Factors, mitigation measures, pre-existing diseases, smoking, etc.
- Virus Origins & Evolution, variations, different strains, animal/livestock hosts
- Interventions to prevent community spread, such as school closures or travel bans etc.
- Vaccines and treatments, drugs in development, clinical studies
- Diagnostics & Surveillance, screening, early detection, sampling methods, tradeoffs between speed, accuracy and accessibility (of tests).
- Ethical Considerations, social science research, needs of caregivers, identifying misinformation during outbreaks etc.
- Information Sharing, data-collection standards, communication methods, coordination of local and Federal, private, public, non-commercial and academic communities.
And the response of tech firms was?
Overwhelming.In a few days, AI people volunteered by thousands. From one company alone (Ericsson, to pick just one example), 350+ employees took part: data scientists, data engineers, data visualizers, PM’s, task managers, leaders…
For operational AI, there’s a need for much more data, up to date, with less “noise” and outlier data (“far-off” values). This is improving as you go, in pace with the spread of COVID-19 and international collaboration. Also, Google, FB etc. are trying to reduce noise “out there.” During pandemics, text mining from social media in general is a useful training-data source for ML systems (but, consequently, also a source of noise, outliers, or outright fakes).
Several open-source apps are emerging. Among the first, CovNet for diagnostics based on pattern-recognition in lung images (to complement COVID testkits and, that way, to lift the accuracy of diagnosis above 98%).
IMO, the underlying constraints on ownership/tenancy/privacy/transfers of medical data, along with urgency and tight deadlines, will make Federated ML and distributed techniques a lot more popular in health apps, if not absolutely necessary (see also the last paragraphs of the previous post).
Federated, UML sequence diagram (Informator course AI, Architecture, and Machine Learning)
Instead of sending the sensitive data to the computation, architects shall remember the “D” in SOLID and do their best to send the computation (here, as a vector of parameters) the other way, to the privately owned data.
Lear more at Milan’s courses:AI, machine learning