Project details
Our basic development processes will follow:
-
Data Preparation: Curate and preprocess the ocular image datasets, ensuring proper labeling and annotation of disease severity levels.
-
Training:
-
Variational Auto-Encoder (VAE) [12]: Train a VAE model on the real ocular image data. The VAE will learn to encode the images into a latent space representation and generate new images from this latent space. We extending the basic VAE to a conditional VAE by conditioning the generation process on the desired disease severity level. This allows us to generate images with specified severity levels. We bring Photrel Risk Aware methodologies by using and assessing the Coupled VAE in this context.
-
Generative adversarial network (GAN) [7]: Implement the TrippleGAN architecture, which consists of three components: a generator, a discriminator, and a classifier. The generator will generate synthetic images, the discriminator will distinguish between real and generated images, and the classifier will classify the disease severity levels. Train the TrippleGAN in an adversarial manner, where the generator tries to produce realistic images that can fool the discriminator, while the discriminator tries to distinguish between real and generated images. The classifier will provide additional supervision by classifying the disease severity levels of both real and generated images.
-
Evaluation: Once the generator is trained, we generate a large dataset of synthetic ocular images with varying disease severity levels and evaluate the performance of disease classification algorithms on this synthetic dataset, comparing the results to their performance on real data.
-
Iterative Improvement: Use the evaluation results to identify weaknesses in the generated images or the classification algorithms. Refine the TrippleGAN architecture, training process, or classification models as needed, and repeat the evaluation process.
This approach leverages the power of VAEs for generating diverse and realistic ocular images, while the TrippleGAN architecture ensures that the generated images accurately represent the desired disease severity levels. The ability to create labeled synthetic data on-demand can be invaluable for training and testing ocular disease classification models, potentially improving their performance and robustness.
We anticipate following an ensemble strategy along the following lines.
-
Data Preprocessing and Augmentation:
-
Preprocess the ocular image data from various sources (e.g., fundus photographs, OCT scans, fluorescein angiography) to ensure consistency in image size, color space, and other relevant properties.
-
Perform data augmentation techniques like rotation, flipping, scaling, and brightness adjustments to increase the diversity of the training data and improve the model's generalization capability.
-
Ensemble of Generative Models:
-
Train multiple TripleGAN models on different subsets of the ocular image data, each focusing on specific types of ocular images or disease conditions.
-
Employ different architectures, hyperparameters, and initialization strategies for each TripleGAN model in the ensemble to capture diverse representations and features.
-
Ensemble of Discriminators and Classifiers:
-
For each TripleGAN model in the ensemble, train multiple discriminators and classifiers with different architectures, loss functions, and regularization techniques.
-
This diversity in the discriminators and classifiers can help capture various aspects of the image data, leading to more robust and accurate evaluation of the generated images.
-
Ensemble Strategy:
-
During inference, generate images from each TripleGAN model in the ensemble, and combine the outputs using an appropriate ensemble strategy.
-
Possible ensemble strategies include averaging, majority voting, stacking, or more advanced techniques like mixture of experts or boosting.
-
Ensemble Calibration and Refinement:
-
Evaluate the performance of the ensemble on a held-out validation set containing diverse ocular image types and disease conditions.
-
Use the evaluation results to calibrate the ensemble weights, retrain or fine-tune individual models, or adjust the ensemble strategy as needed.
-
Iteratively refine the ensemble until satisfactory performance is achieved across all types of ocular images.
-
Continuous Learning and Adaptation:
-
Implement a continuous learning approach to update and adapt the ensemble as new ocular image data becomes available.
-
Periodically retrain or fine-tune individual models in the ensemble or introduce new models to maintain the ensemble's relevance and performance over time.
By employing an ensemble of TripleGAN models, discriminators, and classifiers, we leverage the strengths of different architectures and capture diverse representations of the ocular image data. This approach improves the generalization capability of the overall system, enabling it to work effectively on various types of ocular images while maintaining novelty and diversity in the generated samples.
Data Acquisition and Ethics
Obtaining ocular image data, especially involving patient information, requires careful handling from an ethical standpoint. Researchers typically must partner with hospitals or clinics and obtain approval from ethics review boards to access de-identified/anonymized patient data for research purposes.
For this phase of the project, we will use publicly available datasets like those from Kaggle competitions or research repositories, which have already gone through ethical vetting.
Exploring synthetic data generation approaches as in this proposal minimizes reliance on real patient data during development/testing. Regardless of the approach, maintaining data privacy, obtaining informed consent where applicable, and following ethical guidelines established by organizations like the World Medical Association will be crucial.
Technical Feasibility
For image processing, state-of-the-art deep learning models like convolutional neural networks (CNNs) and vision transformers will be employed. Specific architectures like EfficientNets, ResNets, etc. will be customized for the ocular disease classification task.
For generative modeling, techniques like Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and more recent approaches like Diffusion Models could be leveraged. These are established methods and we do not anticipate technical challenges beyond the implementation process.
Cloud computing resources (TPUs/GPUs) will provide the computational power required to train these large AI models efficiently.
User Interaction Design
The user interface should prioritize ease of use for medical professionals. We will work towards a simple upload interface for ocular images, with clear result visualization.
Advanced users could be provided with options to adjust parameters like desired disease severity levels. Interactive visualizations could allow users to explore the model's decision-making process.
Following accessibility guidelines and involving target users through user testing would be critical for an effective design.
Bibliography
[1] V. Gulshan et al., “Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs,” JAMA, vol. 316, no. 22, p. 2402, Dec. 2016, doi: 10.1001/jama.2016.17216.
[2] S. K. Sattigeri, “EYE DISEASE IDENTIFICATION USING DEEP LEARNING,” vol. 09, no. 07, 2022.
[3] D. S. W. Ting et al., “Artificial intelligence and deep learning in ophthalmology,” Br. J. Ophthalmol., vol. 103, no. 2, pp. 167–175, Feb. 2019, doi: 10.1136/bjophthalmol-2018-313173.
[4] S. Kumar, T. Arif, A. S. Alotaibi, M. B. Malik, and J. Manhas, “Advances Towards Automatic Detection and Classification of Parasites Microscopic Images Using Deep Convolutional Neural Network: Methods, Models and Research Directions,” Arch. Comput. Methods Eng., vol. 30, no. 3, pp. 2013–2039, Apr. 2023, doi: 10.1007/s11831-022-09858-w.
[5] A. A. Marouf, M. M. Mottalib, R. Alhajj, J. Rokne, and O. Jafarullah, “An Efficient Approach to Predict Eye Diseases from Symptoms Using Machine Learning and Ranker-Based Feature Selection Methods,” Bioengineering, vol. 10, no. 1, p. 25, Dec. 2022, doi: 10.3390/bioengineering10010025.
[6] T. Babaqi, M. Jaradat, A. E. Yildirim, S. H. Al-Nimer, and D. Won, “Eye Disease Classification Using Deep Learning Techniques.” arXiv, Jul. 19, 2023. Accessed: Mar. 27, 2024. [Online]. Available: http://arxiv.org/abs/2307.10501
[7] M. Fan, X. Peng, and X. Gong, “Ocular Disease Recognition and Classification using TripleGAN,” in Proceedings of the 2023 8th International Conference on Biomedical Signal and Image Processing, Chengdu China: ACM, Jul. 2023, pp. 7–11. doi: 10.1145/3613307.3613309.
[8] “Jin et al. - 2022 - FIVES A Fundus Image Dataset for Artificial Intel.pdf.”
[9] Z. Yu, Q. Xiang, J. Meng, C. Kou, Q. Ren, and Y. Lu, “Retinal image synthesis from multiple-landmarks input with generative adversarial networks,” Biomed. Eng. OnLine, vol. 18, no. 1, p. 62, Dec. 2019, doi: 10.1186/s12938-019-0682-x.
[10] “Shang et al. - 2024 - SynFundus-1M A High-quality Million-scale Synthet.pdf.”
[11] R. Nuzzi, G. Boscia, P. Marolo, and F. Ricardi, “The Impact of Artificial Intelligence and Deep Learning in Eye Diseases: A Review,” Front. Med., vol. 8, p. 710329, Aug. 2021, doi: 10.3389/fmed.2021.710329.
[12] D. P. Kingma and M. Welling, “An Introduction to Variational Autoencoders,” Found. Trends® Mach. Learn., vol. 12, no. 4, pp. 307–392, 2019, doi: 10.1561/2200000056.
Gombilla
Jun 2, 2024 | 5:10 PMEdit Comment
Processing...
Please wait a moment!
Hello Photrek. I would like to comment on possible concerns associated with this project which may include the ethical implications of generating ocular images, particularly in terms of privacy and consent. There may also be challenges related to the accuracy and realism of the generated images, as well as ensuring that they are representative of diverse populations and ocular conditions. Are the approaches to put these considerations in check ?
photrek
Project Owner Jun 18, 2024 | 3:16 PMEdit Comment
Processing...
Please wait a moment!
Hi Gombilla- we'll be training on pubicly available data, at least initially. So, while we may need to work with one of our academic IRBs in the case of data collection, that's downstream from our immediate work. Regarding accuracy and realism, I think it's fair to say that this is one of the cornerstones of the project. We're looking at an ensemble approach which should help us on the creation side. We'll also, as you are pointing out, need to ensure that our metrics don't just measure realism in a generic and perhaps artificial way, but actually produce clinically authentic images.