Machine Learning for Cancer Research and Discovery
Dr. Issam El Naqa, Moffitt Cancer Center
The Department of Machine Learning at Moffitt Cancer Center has been developing AI technologies for personalizing cancer care and accelerating cancer discovery. These applications utilize multiscale data from molecular testing, medical imaging, and electronic health records (multi-omics) and create an interesting but challenging application of AI. Discover how deep learning methods are used for better data representation, actuarial analysis to predict time for events, and reinforcement learning for optimizing decision making. The department has also been on the forefront of coupling quantum computing with machine learning to improve robustness. In this session, see examples of cancer research and discovery applications and hear about the successes and inherent challenges of the work.
Published: 7 May 2023
Hello, everyone. My name is Issam El Naqa. I chair the machine learning department at Moffitt Cancer Center in Tampa, Florida. I'll be talking today about machine learning for cancer research and discovery. As you may know, there is national and global interest in AI and machine learning application. And these cover several areas, starting by, of course, security industry, as well as health care. We and others have been responsive to our task group requests for applications from the White House.
That request for application resulted in a report that is currently under review by Congress. On the flip side of that, there has been tremendous growth in the regulation of AI applications by the FDA. And this chart here highlights some of these different areas, primarily in areas related in radiology applications, as well as followed closely by oncology applications. Before divulging any deeper to talking about and machine learning techniques, I want to make a quick distinction between deep and conventional machine learning applications.
Deep learning is a subset of machine learning techniques that deal with data representation. In this figure here, you can see that there are two pipelines, one associated with the conventional machine learning techniques that relies on feature extraction, feature selection, and then application of the classification or detector techniques, so to speak. In deep learning, these processes have been combined into one framework.
This can result in reduction of bias associated with these feature selection and feature extraction procedures and lead to a successful-- more successful application of machine and deep learning applications in different areas, including oncology, radiological applications, and an area of work as well as medical physics.
The block diagram in the lower part of the presentations shows one example for such cases. This is a PET CT. Image a region of interest is being extracted. There is the data representation portion that consists of convolutional layer and the data selection part, which is done by pooling layers that does the reduction process. Then there is the learning task classification in this looking at malignant versus benign.
There has been a tremendous increase in the utilization of AI and machine learnings. Though the origin of these applications actually traces back to the '80s and even earlier, more recently there is this increase, especially in radiology areas as well as therapeutic application of radiological techniques in radiation oncology that I will be focusing some of my applications in that specific area.
One key question to ask-- why AI and machine learning in oncology? You cannot open any medical journal, your favorite one, without finding an article that is related to machine learning applications, generally in medicine or in cancer research. And that's kind of spanned the whole spectrum of the literature.
And the key answer to that is related to this image in the middle that, with the growth of patient-specific information, whether this is genetics, clinical imaging, you name it, there is a need for computer algorithms in order to sift through complex data and machine learning. The techniques are well-poised to achieve that goal.
This resonated well with the group here at Moffitt Cancer Center, which resulted in the establishment of the first machine learning department in oncology with a vision to transform personalized cancer care and accelerate scientific discovery with machine and deep learning techniques. The mission is to design, develop, and translate state of the art, patient-centered machine learning techniques. And that speaks to the values we endear here.
First, patient-centered machine learning techniques, then generalizable and interpretable machine learning and deep learning technology. So it can garner the trust of the end user, in this case our clinicians. And the value is that it's not-- we are only not limited to working in the research areas, but also translation into clinical practice, so moving AI from the sidelines to the front lines in cancer research and cancer care.
In terms of the strategic priorities that we have here at Moffitt, they're listed in this chart, as you can see. First and foremost is the integration of machine learning and deep learning techniques into the research and the clinical care fabric here at Moffitt and spread that nationally and internationally as well by example.
Establish translational research programs in priority areas-- that includes but is not limited to imaging application, radiological and pathological imaging application, information retrieval with NLP, outcome modeling, and decision support systems, molecular and computational biology, and in silico trial designs.
Areas that will be supported in the development of machine learning for clinical implementation would include visual, analytic, explainable, and interpretable machine learning AI-- I'll talk more about this-- automated ML architecture and evolutionary learning, physics-based quantum machine learning techniques, and hybrid systems. And I will be highlighting some of these applications as well.
Developing these science initiatives that constitute collaborations between clinicians, biologists, and data sciences, which is critical in order to solve existing problems, as well as ensure a translation into clinical care. We are very much also interested in the development-- and we are working towards that-- of a training program, whether that's on the graduate level or residency program, specifically for machine learning for application in oncology.
In order to meet these strategic priorities, we actually established a team. And our team consists of multiple stakeholders. There is our primary faculties that include five faculties in areas related to our strategic priorities in translational and basic machine learning applications, as well as secondary faculty primarily consisting of physicians. So we have Dr. Louis, who is a pathologist. Dr. Denig, is a radiation oncologist. And Dr Furio, who is a radiologist covering some of the main areas. We are also supported by staff and machine learning engineers. And I'll be showing some of that work later on, too.
Just focusing a little bit from a narrow angle into what we do in our lab, our lab is focusing on optimizing decision making for application in oncology using centralized, as well as federated learning techniques. We also apply this technology for image guidance in radiation oncology, looking at the sight and sound.
We are part of a major effort for improving the quality and making data accessible, especially in the medical imaging area and also in adaptive, regimented treatment with more advanced technology, as well as looking at data science application for optimizing therapy in different cancers, including prostate cancer, and looking also into patient-reported outcome versus conventional clinical outcome, which can provide more accurate-- more insight into the patient part of it and improving the quality of life. This is the team members that actually carry-- all of-- most if not all the workload and result in some of the work that I'll be presenting in the next few slides.
Just taking a step back, talking about the precision medicine or precision oncology is something called the pan-omics of oncology. There are many different data resources available in oncology. And this highlights some of these data sets. Start by the specimens-- tissue, blood, saliva, urine. Images then go through the different omics. That could be genomics, proteomics, transcriptomics, metabolomics. And as well as in the imaging world, we're going to focus on what we call genomics.
This data needs to be annotated related to clinical outcomes in order to be used by machine learning techniques. Just a little bit about genomics-- there is conventional genomics techniques that relies, again, on feature extraction and feature selection that's being slowly replaced by more deep learning techniques, as I alluded earlier. These are some of the toolkits that are a necessary in order to conduct successful AI application in imaging or radiomics.
And these are tools that are allowed to do pre-processing, deblurring, denoising, image registration, conventional, and deformable image segmentation and auto-contouring, as well as feature extraction and integration with clinical workflow. These tools are all actually built using MATLAB and different frameworks from MATLAB. This tool here actually is using an SDK that supports MATLAB and integrated with a commercial software for ease of use with our clinician colleagues.
The other aspect beside feature analysis and extractions is how you take these features and build models. And this is a software tool that's primarily focusing on developing response skills using imaging information, as well as other clinical dosimetry information as well.
Now, moving from more conventional machine learning applications into deep learning, this is an example showing a combination of conventional machine learning techniques with deep learning techniques for predicting survival of liver cancer post-treatment with radiation therapy. And you can see you can actually separate the two populations into the patients who will benefit versus the patients who will not benefit accordingly. But there is still room for improvement in terms of performance of these type of algorithms.
This is another example highlighting actually how we can deal with longitudinal data with AI techniques and apply cost-saving type applications. By looking at data-- in this case, looking at the function of the liver who has cancer and is being treated with radiation and the first column showing pre-treatment acquisition of MRI, contrast in house images. The second one showing another collection of these images. But it's happening during therapy, which could be inconvenient for the patient, as well as costly for the institutions.
So we use, again, type techniques, different kind of, again, technologies in order to see if we can actually substitute for that second acquisition by the initial imaging information and the imaging as well as clinical information and see what happened to our predictions. And lo and behold, we can see that-- actually, you can, by using these techniques, pretty much conduct the whole experiment without the need for a second acquisition of MRI images.
However, this does not work in all cases. In the top cases, you see it's working very well when the liver status is relatively in a good shape. But however, when the disease status is very severe, this would require more work in addition to the second acquisition may be actually necessary. So at least, it gives us a cost-saving application in certain scenarios.
Now, speaking of multi-omics, moving from one imaging modality to combination and integration with other data sets that include genetic information, protein profile, as well as dosimetry and looking at prediction of outcomes post-lung cancer treatments. In this case, we're looking at a competing risk-type application, looking at radiation inflammation, toxicity called radiation pneumonitis, as well as local control.
And you can see from this figure, these machine learning techniques that not only measure the events, but also the time to event using these survival network-type architectures provide a significant improvement over more conventional models that is being used in the literature that that's primarily to the ability to integrate heterogeneous and diverse information into a single framework.
In order to facilitate translation of AI into clinical applications, we've been developing software tools. These software tools could be recommended software tools for adaptive interventions for radiotherapy, as well as to facilitate collaboration with other institutions, as I mentioned with that medical imaging data resource, and ensure better interpretability of this information. In these two applications, we focus specifically on user factors associated with AI implementation.
There are many evidence to suggest that application of AI in retrospective study is different from application of AI in prospective studies on when there is a patient on the bed, so to speak. And these tools help us understand that aspects of AI application implementation, which is critical for successful deployment and future use of AI in the clinic.
Having said that, I'll be the first to tell you that AI machine learning is nothing perfect. And there are many examples to suggest that there are some exist in the public domain probably familiar with issues related to race and gender discrimination with some of these techniques, as well as premature application, like this example of this model in the EHR system called Epic. There are other also intriguing failures of AI applications.
For instance, these algorithms can learn some artifacts and images for prediction of skin cancer or a similar case looking at a tube when trying to predict what's happening in lung cancer, which are not things that are considered by the clinical team. But the algorithms have this ability to hone on some artifacts which need to be medicated by the team who's designing these algorithms. COVID-19 also revealed many issues associated with AI applications and unreliable implementation of it.
So there are two critical aspects for successful AI applications. One is related to the data quality itself. And the other one is related to the context that would require domain expertise that is involved in the development and the deployment of these algorithms. Therefore, several societies, including this one from medical physics, came up with different checklists in order to ensure reproducible AI implementations.
These checklists ensure that the data is actually being used and the methods are being transparent and highlight what are the significance of these results, justification for AI use, as well as interpretability and actionability of these methods. These are an example when submitting to the medical physics journal that any author needs to fill in for the submission to proceed in the review process.
Talking about the interpretability aspects, deep learning technique tends to be more accurate, but less interpretable, while decision trees, on the flip side of that, tend to be more interpretable, but less accurate. There are techniques in order to improve the ability of deep learning using proxy models or, in the case of decision trees, using examples. These are a couple of examples showing some of these techniques application.
In the case of imaging, for example, for the liver cancer case, when looking at this imaging more carefully with Grad-CAM, for instance, highlighting certain areas and images that our radiologists were able to discern that were related to an excessive cirrhotic. And this is what was driving the prediction of the algorithms.
That's easier to visualize in images versus using multi-omics data, where expertise are necessary in order to understand what the algorithm actually has learned and whether this conform with clinical practice and clinical understanding in order to ensure better application as well as transferability of these techniques into practice.
Another approach to mitigate some of the challenges, the bias, and the uncertainty of AI techniques and the limited data set is using what's called the human in the loop or intelligence augmentation. Here, you can see this is a Bayesian network looking at prediction of response in lung cancer, looking at these two endpoints of local control and radiation, as I mentioned earlier. The one on the top is purely data-driven. The bottom one, we tried to show it to our physicians.
It did not feature some of the variables that actually they think that should be there. So we reconstructed these algorithms to include the expertise. And the result of that resulted in a similar prediction because both are data-driven techniques. However, the confidence intervals were tighter, suggesting better generalizability of these approaches.
A more intriguing case, when there is a data imbalance-- this is looking at prediction of local control in liver cancer, which is about 90%. So there is only 10% in one class versus 90% in the other class. Looking at the human prediction only, you can see it's a modest prediction of AC of 0.6.
The machine did much better, sifting through more complex data, giving 0.8. But when you compare a combined the human expertise with the machine learning algorithms, this resulted in a significantly improved performance, about AC of 0.86, as seen here, highlighting the importance of combining human expertise with machine algorithms in order to reap the benefits of both together.
Another area that we think would have a significant impact on improving the machine learning application and robustness is the combination of physics plus AI. And this is using quant mechanics-type techniques for treatment planning, resulting in significant speedup of performance or more robustness when looking at the form of changes that's happening during treatment for image guidance applications, as well as improving decision making by this combination of quantum computing and machine learning techniques, resulting on more robust performance following what's called the Shaw principle.
So to recap, I hope I was able to demonstrate that AI and machine learning techniques offer new opportunities to develop better understanding of oncology and its diagnosis, prognosis, and treatment regimen. Machine learning algorithms comes with different flavors that vary in accuracy and interpretability. And one needs to make an executive choice of what is the proper algorithms for the proper application.
Proper development and deployment of AI need to involve using guidelines-- I mentioned one of these called the clam-- and adhering to ethical AI standards, which usually used to be at the back end and needs to be at the front. And now, to overcome current barriers in AI and machine learning techniques, there needs to be more use of interpretable AI applications using visualization techniques.
For instance, behavioral science, human in the loop, or even physics-based techniques improve robustness like quantum computing. This would require collaboration between stakeholders, data scientists, biologists, physicists, economists, clinical practitioners, regulators, and vendors as well, in order to ensure safe and beneficial application of AI in medicine. I'll leave you with some of these references and highlighting our team here. And thank you for your attention.