Thinking about artificial intelligence (AI) has changed dramatically over time and has become about as hard to describe as art, says Jeff Fried, director of product management for data management company InterSystems. Machine learning (ML) is commonly considered a subset of AI—often a necessary component, in fact—but also encompasses methodologies such as logistic regression once referred to simply as statistics.
The data is in any case “much more important than the algorithms, which are pretty much the same as they were 20 years ago,” says Fried. Data scientists spend most of their time choosing, gathering, combining, structuring, and organizing data so an algorithm can generate meaningful patterns.
While the volume of digital healthcare data has grown exponentially, Fried says, accessing it can be a challenge due to ever-present security and privacy concerns and because it’s often locked in siloes—even across departments within the same organization. But barriers have fast become opportunities thanks to greater availability of open-source data from places like the National Library of Medicine, giving ML algorithms more data to chew on, and the FHIR standard developed by HL7 International that enables seamless, on-demand information exchange.
Enabling data interoperability and sharing on a statewide and national basis is also a core capability of InterSystems, he adds, as is capitalizing on the potential of AI in medicine now computing resources are available in the cloud. Computing is also more affordable for those doing it in their own data center.
Pathology Project
InterSystems’ approach to AI is a sensible one: pick the right problem, get an early win, build on that momentum and repeat. A recent project with Massachusetts General Hospital (MGH), to improve the accuracy of genomic data generated by its Center for Integrated Diagnostics (CID), is a case in point, says Fried. It began by having an ML model quietly train in the background to identify potential risk patterns for cancer based on what pathologists had themselves discovered.
The primary output of the genomic sequencer—roughly 400 million discrete readouts of short fragments of DNA per run—defied human interpretation, Fried notes. For a single tumor sample on one patient, about 2,000 variants are detected but generally only about two or three get reported to oncologists and everything else is considered “noise” and of no use in guiding treatment decisions.
As part of the project, InterSystems helped MGH built a data lake that serves as a single source of data from its laboratory information system and houses an open-access ML library used for data exploration, production, and deployment to break down data silos within the CID. Maciej Pacula, team lead for computational pathology at MGH, has compared it to a “Facebook activity feed” used by everyone from lab technicians and attendings to people on the bioinformatics team, Fried says.
ML calculations use the Random Forest algorithm that shows the justification for positive or negative decisions, says Fried. The model makes a highly sensitive call, but also looks at the noise pathologists would have rejected trying to find any diamonds in the rough worthy of additional manual inspection. Overall, pathologists end up spending less time per case with the ML-powered decision support tool, he adds.
The cross-check system for ensuring next-generation sequencing data is tied to the right patient includes verifying that the predicted gender matches what’s in the electronic health record, he continues. Previous genotyped results, when available, also get compared with current results to be sure the overlap between detected alterations is as high as expected.
InterSystems also helped develop an open-access Survival Portal that allows MGH oncologists to look at a cohort of past patients with the same genetic signature as a current patient to determine which drug might deliver the best outcomes, says Fried. The patterns it reveals might support hypothesis generation for future clinical trials.
Next up will be a tool for predicting microsatellite instability in cancer patients, based on mutations in the targeted cancer panel, who can then be treated with Keytruda, Fried says. MGH hopes to turn the tool into a clinical screening test.
MGH plans to replicate what it did in its Center for Integrated Diagnostics in other areas of the organization such as cytometry, says Fried. It is also making a big push into digital pathology and deep learning.
Holdups in the Healthcare Arena
One reason AI has had a hard time getting out of research realm and into the high-stakes clinical arena is that results are “data-dependent, fuzzy and non-deterministic,” says Fried, plus the technology is changing rapidly. “If you come at this as an IT-oriented project it will drive you crazy. You can’t set a quality threshold and a time goal at the same time.” Even when put into production, some data attributes might change that can affect the model in unpredictable ways, he says.
But AI clearly makes sense in clinical settings to address operational inefficiencies and “turbo charge” bread-and-butter processes, says Fried. For example, InterSystems partner HBI Solutions uses AI to predict both the risk of patient readmission and entry to the emergency room. “Accuracy rates are dependent on the local data and environment, but even at the low end of the accuracy range you can get a lot of benefit.”
Improving hospital workflow is one way that AI tends to be used in clinical settings, Fried says. The other is for decision support in specialty areas like pathology and radiology with lots of data and “smart people who are going to pay attention, so the training sets end up being really good and the risk of error is mitigated by the expert.”
Bias in any ML model is practically a given, he adds. “Outside of healthcare we run into this everywhere. Banks that do risk predictions on loans have well-known biases that end up being racist, and even conscientious scientists end up with very pernicious models. The college admissions process is completely screwed up in part because everyone is gaming their US News & World Report ratings and their very sophisticated ML models use flawed proxies, like someone’s likeliness to contribute to their alma mater after they graduate.”
One of the best antidotes to bias is to put humans in the loop to bring in some common sense, Fried says. Of course, humans have their own biases but, unlike machines, tend to know they do. “Machine learning algorithms can also be over-trained by giving them too many examples of the same thing”—an ML model built to predict car color in a certain geography might over-predict for the popular color white, he cited as an example.
In pathology, an ML algorithm might be similarly over-trained on old and perhaps abandoned practices. “Don’t be afraid of retraining,” he advises. “In the domain of online merchandising people typically retrain from scratch at least every two weeks because consumer behavior changes a lot, plus there’s no dearth of data or data access problems and the risks of being wrong are low.”
Machine learning might one day be used in scenarios not yet imagined, such as having push cart vendors monitoring hospital air quality or people monetizing their personal health data over the course of the flu, says Fried. He notes that the Port of New Bedford is a leader in an initiative of the National Oceanic and Atmospheric Administration that is equipping fishing vessels with sensors to find fish in exchange for paying fishermen a small stipend to record water temperature on their daily travels.