Introducing a Multitask Inpainting Brain Tumor Algorithm Using Machine Learning and Diffusion Models

In recent years, the number of AI publications, and in particular machine learning (ML), related to medical imaging, has increased dramatically. PubMed searches using Mesh keywords for \”artificial Intelligence\” and \”radiology\”, in 2021 will yield 5,369 results, five times more than in 2011. From classification to object detection and semantic segmentation to image generation, ML models are continually being developed to improve the efficiency and outcomes of healthcare. Many published reports, for instance, in diagnostic radiology indicate that ML can perform as well as or better than medical experts at specific tasks such as anomaly and pathology detection.

When used correctly, AI is a powerful tool that can help radiologists reduce their workload. Despite the increasing interest in developing ML for medical imaging, there are significant challenges that can limit their practical application or even make them more biased. Two of these challenges are data scarcity and imbalance. The medical imaging datasets tend to be smaller than the natural photography datasets, such as ImageNet. Pooling or making public institutional datasets may not be possible due to privacy concerns. Data scientists can have access to medical imaging datasets, but they could be balanced.

The volume of medical images for patients who have specific pathologies will be significantly less than that for those with common pathologies, or for healthy people. Insufficiently large datasets or unbalanced datasets can lead to systemic biases when used to train or test a machine-learning model. In addition to deidentified medical image datasets, and the endorsements of strategies like federated-learning, which allows machine learning models to be developed on multi-institutional data without sharing data, synthetic image generation is a primary strategy to combat data shortage and data imbalance.


