Leveraging Diffusion Models and AI to Combat Data Scarcity and Data Imbalance: Introducing a Multitask Brain Tumor Inpainting Algorithm

AI Researchers at Mayo Clinic Present a Machine Learning-Based Method for Leveraging Diffusion models to Construct a Multitask Brain Tumor Painting Algorithm

In recent years, the number of AI publications, and in particular machine learning (ML), related to medical imaging, has increased dramatically. PubMed searches using Mesh keywords for \”artificial Intelligence\” and \”radiology\”, in 2021 will yield 5,369 results, five times more than in 2011. From classification, to object detection and semantic segmentation, to image generation, ML models are continually being developed to improve the efficiency and outcomes of healthcare. Many published reports, such as those in diagnostic radiology indicate that ML can perform as well as or better than medical experts at specific tasks.

When used correctly, AI is a powerful tool that can help radiologists reduce their workload. Despite the increasing interest in developing ML-models for medical imaging there are significant challenges that can limit their practical application or even make them susceptible to bias. Two of these challenges are data scarcity and imbalance. Due to privacy concerns, it may not be possible to pool institutional datasets, or make them public, because medical imaging datasets tends to be much smaller than datasets of natural photographs, such as ImageNet. Data scientists can have access to medical imaging datasets, but they could be balanced.

The volume of medical images for patients who have specific pathologies will be significantly less than that for those with common pathologies, or for healthy people. Insufficiently large datasets or unbalanced datasets can lead to systemic biases when used to train or evaluate machine learning models. In addition to deidentified medical image datasets, and the endorsements of strategies like federated-learning, which allows machine learning models to be developed on multi-institutional data without sharing data, synthetic image generation is a primary strategy to combat data shortage and data imbalance.

Source:

AI Researchers At Mayo Clinic Introduce A Machine Learning-Based Method For Leveraging Diffusion Models To Construct A Multitask Brain Tumor Inpainting Algorithm