Next Prescription Predictor: GPT Medical Forecast

chevron-icon
Back
project-presentation-img
Ammar Khairi
Project Owner

Next Prescription Predictor: GPT Medical Forecast

Funding Requested

$65,000 USD

Expert Review
Star Filled Image Star Filled Image Star Filled Image Star Filled Image Star Filled Image 0
Community
Star Filled Image Star Filled Image Star Filled Image Star Filled Image Star Filled Image 3.7 (7)

Overview

The Next Prescription Predictor (NPP) is an AI system that can forecast future medical events for patients based on their health records. By extracting relevant medical terms from electronic health records (EHRs) using Named Entity Recognition (NER), the NPP learns patterns of symptoms, diagnoses. The core NPP model is a focused language model that predicts the likely progression of a patient's condition. A companion module translates and explains the predictions in simple terms across multiple languages. NPP allows for simulating possible scenarios to avoid risks, conduct trials or for education.

Proposal Description

How our project will contribute to the growth of the decentralized AI platform

Our project will contribute to the growth of the AI platform in two main ways:

1. NPP will serve as a foundational medical model, enabling the development of various downstream AI applications tailored for healthcare by providing a robust model trained on medical data.

2. NPP's ability to forecast future medical events and provide multilingual explanations will attract both business (healthcare providers, researchers) and consumer (patient education) users to the SNET platform.

The core problem we are aiming to solve

The key problem the Next Prescription Predictor aims to address is the inability to effectively leverage the wealth of unstructured, free-form text data contained in electronic health records (EHRs). Current AI models are primarily trained on structured data and used for diagnostic predictions at a single point in time, lacking the ability to provide foresight and preventive recommendations. Using generative models on free-form text data will enable forecasting future medical events and patient trajectories.

While EHRs store vast information about a patient's medical history, the majority exists as unstructured text like clinician notes and discharge summaries. The unstructured nature of this data, combined with complex medical terminology, poses a significant obstacle in utilizing EHRs for predictive and prognostic purposes. As a result, healthcare providers often cannot anticipate a patient's evolving conditions, missing opportunities for timely interventions and personalized care planning.

Our specific solution to this problem

The Next Prescription Predictor (NPP) tackles the challenge of leveraging unstructured electronic health record (EHR) data through a self-supervised learning approach. The core solution is a specialized language model trained on a large corpus of EHR text data like clinical notes. Utilizing self-supervised techniques, the model learns patterns between medical terms, symptoms, diagnoses, and treatments without requiring labeled data.

A Named Entity Recognition (NER) component extracts relevant medical entities from the text as input tokens for the language model. The model then predicts the likely sequence of future medical events based on the patient's historical data. A separate module classifies and explains these predictions in plain language, ensuring interpretability for healthcare providers and patients.

This modular solution enables accurate forecasting of a patient's medical trajectory from unstructured EHR data, facilitating proactive care and personalized treatment planning. Training a specialized model with medical terminology as tokens addresses the hallucination problem faced by many large language models. The self-supervised approach eliminates the need for costly expert labeling while allowing the incorporation of multiple information sources. The classification and interpretation module ensures the trustworthiness of the generated predictions. Strict anonymization protocols are followed to protect patient privacy during data collection for model training.

Project details

The NPP project employs a modular, data-driven approach to forecast future medical events from unstructured electronic health record (EHR) data. The key components and aspects of the project are:

Data Ingestion and Preprocessing:

The system ingests EHR data from a variety of sources, including structured databases, free-text clinical notes, discharge summaries, and synthetic/anonymized datasets. To access large-scale EHR text data, the project will actively pursue:

  • Utilization of public data resources like the MIMIC-III database, n2c2 shared tasks, and i2b2 challenges as starting points.
  • Using synthetic dataset available on competition websites like Kaggle and using medical generative models like Med-Gemini to augment the data.
  • Possible Partnerships and collaborations with healthcare organizations, research institutions, or consortiums that have access to EHR databases in Sudan, UAE and UK.

A robust data preprocessing pipeline is employed to clean, normalize, and format the data, preparing it for further processing.

Named Entity Recognition (NER) Module:

A crucial component is the NER module, tailored for the medical domain. This extracts relevant entities like symptoms, diagnoses, treatments, and procedures from unstructured text. The NER will be based on existing models and will output a timeline of medical terms (tokens) given any form of textual data. It will operate as a sophisticated tokenizer for  a rather simple Language Model.

Core Self-Supervised Language Model:

The core is a specialized language model architecture designed for sequence modeling and prediction, trained on a large EHR corpus using self-supervised techniques without labeled data. The model can theoretically generate any length of future from the most basic history description. However, as we will employ a mid size model here , GPT-2 like architecture , to make the most of our data, this will limit the context of the model. Hence, the expected ideal use of our model would be to use a sequence length of around 128 tokens (history and prediction). 

Classification and Explanation Module:

This component categorizes predicted medical events into understandable labels and provides plain language explanations tailored for healthcare professionals and patients, with multilingual support. The classification module could be a simple  static word embedding + MLP classifier setup with 3-4 classes. This will give the output of the model a general structure. The explanation will be an optional feature and integrated through API calls.

User Interface, Integration, and Community Engagement:

For end-users, the Next Prescription Predictor will be a full pipeline where they can input any historical health data in text format, which will then be processed, forecasted, classified, and explained, providing comprehensive predictions about future medical events.

Beyond just the end-user experience, the NPP's modular design allows each component - the Named Entity Recognition module, core language model, and classification/explanation models - to be made available to developers on SingularityNET. Developers can integrate these into their own solutions or train and customize them for specific use cases.

Imagine a virtual medical assistant using the NER module to understand conversational inputs, or a researcher fine-tuning the language model for studying treatment effectiveness. By hosting as serivces in SNET these components, we empower the community to build upon the NPP's foundations, driving healthcare innovation through customized AI applications, whether using the modules as-is or tailoring them to niche domains

Our Team

The Next Prescription Predictor project benefits from a multidisciplinary team with expertise spanning machine learning, natural language processing, language model deployment, and healthcare informatics.

  • Selma Mohamed brings substantial knowledge in machine learning algorithms and techniques, along with extensive experience in scraping, processing, and cleaning large-scale textual data from diverse sources.
  • Mukhtar Mohammed's prior accomplishments in deploying and operationalizing language models make him ideally suited to handle the deployment, integration, and onboarding aspects of the NPP system.
  • Ammar Khairi will spearhead the training and evaluation of the project's core models, leveraging his previous experience with large language models (LLMs), as well as his current research work in forecasting lab measurements and identifying gene mutations. His expertise in these areas will be invaluable for developing accurate and robust prediction capabilities within the Next Prescription Predictor system.
  • To ensure medical expertise and domain knowledge, we have allocated 50% of the data milestone budget and 25% of the validation milestone budget to onboard a medical professional with a pathology background. This individual will play a crucial role in accessing and curating relevant electronic health record data, as well as providing valuable insights during the model validation and interpretation phases.

By combining technical proficiency with domain-specific medical knowledge, our team is well-equipped to undertake the development, deployment, and continuous improvement of the Next Prescription Predictor system.

Continuous Improvement:

The project incorporates feedback loops, expands data sources, and enables fine-tuning models on specific medical domains/use cases for specialized applications.

The competition and our USPs

No

Our team

  • Ammar Khairi: Leads model training and evaluation, leveraging LLM and medical forecasting expertise.
  • Selma Mohamed: Machine learning expert with textual data processing skills.
  • Mukhtar Mohammed: Experienced in deploying and operationalizing language models.
  • Medical Pathology Expert: Dedicated budget (50% of milestone 2 + 25% of milestone 5) for onboarding a medical professional to guide data curation and model validation. 
View Team

What we still need besides budget?

No

Existing resources we will leverage for this project

No

Open Source Licensing

Apache

Links and references

https://arxiv.org/abs/2212.08072

AI services (New or Existing)

Next Prescription Predictor (NPP)

Type

New AI service

Purpose

The NPP forecasts a patient's future medical prescription symptoms or diagnosis based on terms in the past given in any form of chronological text.

AI inputs

NPP can use either structured or unstructured data (like clinical notes) from a patient's health records. Any form of text is acceptable. For accurate generations text should contain medical information from past in chronological order.

AI outputs

NPP provides a timeline of predicted medical events and easy-to-understand explanations. It categorises events (like symptoms or treatments). Output is in JSON.

NPP: Named Entity Recognition (NER) Module

Type

New AI service

Purpose

Extract relevant medical entities (symptoms diagnoses treatments procedures) from unstructured text data.

AI inputs

Clinical notes discharge summaries or any textual health record data.

AI outputs

Sequence of tagged medical entities/terms from the input text.

NPP: Core Language Model

Type

New AI service

Purpose

Predict sequences of future medical events based on patterns learned from past health data.

AI inputs

Timeline of medical terms extracted by the NER module or from other sources.

AI outputs

Generated terms describing forecasted medical events and patient trajectory.

NPP: Classification and Explanation Module

Type

New AI service

Purpose

Categorize predicted medical events into understandable labels and provide plain language explanations.

AI inputs

Text output from the core language model NER model or any other source containing predicted events.

AI outputs

Classified labels for medical events along with multi-lingual textual explanations.

Proposal Video

Placeholder for Spotlight Day Pitch-presentations. Video's will be added by the DF team when available.

  • Total Milestones

    7

  • Total Budget

    $65,000 USD

  • Last Updated

    20 May 2024

Milestone 1 - API Calls & Hostings

Description

This milestone represents the required reservation of 25% of your total requested budget for API calls or hosting costs. Because it is required we have prefilled it for you and it cannot be removed or adapted.

Deliverables

You can use this amount for payment of API calls on our platform. Use it to call other services or use it as a marketing instrument to have other parties try out your service. Alternatively you can use it to pay for hosting and computing costs.

Budget

$16,250 USD

Milestone 2 - Data Collection

Description

Collecting diverse and comprehensive electronic health record (EHR) data necessary for training the AI model.

Deliverables

Compilation of cleaned and formatted structured and unstructured EHR data ready for model training.

Budget

$8,500 USD

Milestone 3 - Named Entity Recognition Model

Description

Developing a Named Entity Recognition (NER) model to extract relevant medical entities from unstructured EHR text data.

Deliverables

Trained NER model capable of identifying and categorising medical entities (e.g. symptoms medications) from clinical notes.

Budget

$10,000 USD

Milestone 4 - Core Language Model Training

Description

Training the core language model using the collected EHR data to understand medical context and predict future medical events.

Deliverables

Fully trained core language model with capabilities to generate accurate predictions based on patient EHR inputs.

Budget

$15,000 USD

Milestone 5 - Evaluation and Testing

Description

Evaluating the performance of the AI model through benchmark and field expert testing and validation procedures.

Deliverables

Comprehensive evaluation report highlighting the model's accuracy reliability and potential areas for improvement.

Budget

$5,250 USD

Milestone 6 - Classification and Explanation

Description

Developing a system to classify predicted medical events and provide understandable explanations to users.

Deliverables

Implemented classification system with easy-to-understand explanations for predicted medical events.

Budget

$5,000 USD

Milestone 7 - Interface and API development

Description

Creating user-friendly interfaces and APIs useful for users and developers building on the service.

Deliverables

Functional interfaces for users and APIs allowing seamless integration of the modules of the NPP service into SNET community

Budget

$5,000 USD

Join the Discussion (3)

Sort by

3 Comments
  • 0
    commentator-avatar
    HenriqC
    May 18, 2024 | 1:49 PM

    I want to also comment on the question about data acquisition. There is plenty of freely available public health data and it is definitely valuable to get as much out of it as possible. In my view however, the future of this field will belong to those who can provide individuals with the best incentives to generate high quality high value data. The more value you, as a service provider, are able to create with the data you have, the more you can reward those data contributors. Well-resourced data contributors are able to invest in generating the best possible bio/health information.    This view is not too crucial a factor in the scope of this current proposal but maybe something to think about going forward.   

    • 0
      commentator-avatar
      Ammar Khairi
      May 19, 2024 | 9:43 PM

      Hey Henriq, First off, thank you so much for taking the time to review our project proposal and share your valuable insights. It is really appreciated. You made an excellent point about listing the different NPP components (NER module, language model, classification/explanation) as individual AI services in addition to the full end-to-end pipeline. We've updated that section accordingly, as providing that flexibility and modularity for developers to pick and choose pieces is a great suggestion. I can already see the number of propsal for this round in biomedical field growing and it will be exciting to provide our tools to them. And great call on the long-term data acquisition play through incentivizing high-quality individual contributors. Will keep that in mind, but it's worth noting that for this initial project, we will be solely relying on public and synthetic datasets. Collecting real-world medical data requires rigorous checks and compliance with regulations to ensure proper consent and adherence to rules around patient privacy and data handling. Again, we're grateful you took the time to thoroughly review and share these valuable tips!

  • 1
    commentator-avatar
    HenriqC
    May 18, 2024 | 1:47 PM

    I like the way the NPP is going to be composed. I also like the fact that you are applying funding independently for the AI service development specifically. It is a really smart choice and definitely increases your chances to get funded.       So you are going to launch the NPP in the SNET marketplace as a holistic service, correct? In addition to that, wouldn’t it be possible to launch the different components independently there as well? If that’s the case, in my opinion, you can list those different components in the “AI services (New or Existing)" -section.       There are currently many AI based biomedical/health projects building end-user applications and I believe this proposed service may find surprisingly high demand in the markets.  

    Reply
    Upvoted by Project Owner

Reviews & Rating

Sort by

7 ratings
  • 0
    user-icon
    TrucTrixie
    May 9, 2024 | 1:47 AM

    Overall

    4

    • Feasibility 3
    • Viability 3
    • Desirabilty 4
    • Usefulness 4
    Should developed proposal with long-term vision

    The centralized language model is deployed towards decentralized things in the long term future. This should be a long-term proposal with its interest and feasibility. I look forward to seeing the team develop this proposal over many different periods of time, accompanying SNET through many Rounds to add a great impact for the community.

  • 0
    user-icon
    GhostlyGaze
    May 6, 2024 | 7:49 AM

    Overall

    3

    • Feasibility 3
    • Viability 4
    • Desirabilty 3
    • Usefulness 3
    A Complex yet Innovative AI System for Medical

    The Next Prescription Predictor (NPP) project presents a complex yet innovative AI system designed to forecast future medical events for patients based on their health records, earning a three-star rating. Utilizing Named Entity Recognition (NER) to extract relevant medical terms from electronic health records (EHRs), NPP leverages pattern recognition to predict the progression of a patient's condition.

    One of the project's notable strengths is its focused language model, which forms the core of NPP's predictive capabilities. This model's ability to analyze symptoms and diagnoses patterns shows promise in assisting healthcare professionals in making informed decisions regarding patient care.

    Moreover, the inclusion of a companion module that translates and explains predictions in simple terms across multiple languages enhances the accessibility and usability of NPP for a wider audience. The functionality to simulate possible scenarios for risk avoidance, conducting trials, or educational purposes adds further value to the system.

    However, there are areas where NPP could be improved. Ensuring the accuracy and reliability of predictions generated by the AI system is crucial for its adoption and effectiveness in clinical settings. Additionally, addressing concerns related to data privacy and security when dealing with sensitive health records is paramount for user trust and compliance with regulations.

  • 0
    user-icon
    BlackCoffee
    May 5, 2024 | 1:12 AM

    Overall

    5

    • Feasibility 4
    • Viability 4
    • Desirabilty 5
    • Usefulness 5
    More detailed presentation about decentralization

    Decentralized AI platform is what is of most interest when reading this proposal. I emphasize the word "Decentralized", it is completely different from other proposals where there is still centralization. To do so, the team needs to make great efforts in the main milestones (7 milestones). Special attention should be paid to NER and the core Self-Supervised Language Model (described in more detail). It is two stages containing important technology to gradually move towards decentralization.

  • 0
    user-icon
    Max1524
    May 2, 2024 | 11:39 PM

    Overall

    4

    • Feasibility 4
    • Viability 3
    • Desirabilty 3
    • Usefulness 4
    I highly appreciate the feasibility of proposal

    Based on the available data, it is a favorable condition for the successful exploitation and application of the proposal. Provided that the data source must ensure accuracy (I think the team should consider this carefully when implementing)
    Publicizing the team's identity is quite good, although there are no member photos attached to the profile.

  • 0
    user-icon
    Viclex Ad
    May 3, 2024 | 2:17 AM

    Overall

    4

    • Feasibility 5
    • Viability 4
    • Desirabilty 5
    • Usefulness 5
    AI Tool for Development and Healthcare Innovation

    Impact assessment study describing SingularityNET's projects in Africa and their results and efficacy. identifying effective programs and scaling best practices. Techniques were created to maximize program impact and explore areas for improvement. a strategy for implementing initiatives that are successful in scaling up and incorporating improvements into upcoming projects and activities.

    Feasibility:

    The proposal describes a workable artificial intelligence (AI) system that uses electronic health records (EHRs) and named entity recognition (NER) to forecast future medical events. The technique is compatible with Deep Funding's capabilities and is potentially feasible.

    Viability: 

    The lack of specific reference to the team's AI and healthcare experience may have an impact on trust in the project's successful completion. Nonetheless, the funding allotted for significant benchmarks demonstrates a methodical strategy, augmenting feasibility.


    Desirability:

    Resolving the difficulty of using unstructured EHR data for forecasting fulfills a vital healthcare need. The project is made more desirable by the possible effects it may have on preventive actions and individualized care.

    Usefulness: 

    The NPP has made a significant contribution to the development of the decentralized AI platform. It attracts a wide range of users, provides a basic medical model for downstream AI applications, and improves platform utility with multilingual explanations.

    Overall Review: 

    The plan shows good desirability, utility, and practicality. A clearer understanding of the team's experience and performance history could raise the viability grade. All things considered, it makes a strong case for funding and makes a major contribution to the expansion of the decentralized AI platform.

  • 0
    user-icon
    Joseph Gastoni
    May 1, 2024 | 11:28 AM

    Overall

    3

    • Feasibility 3
    • Viability 3
    • Desirabilty 3
    • Usefulness 3
    The project has a high potential

    The Next Prescription Predictor (NPP) has a high potential for success due to its feasibility, viability, desirability, and significant usefulness in the healthcare domain.

    Feasibility:

    • The project leverages existing techniques like Named Entity Recognition (NER) and self-supervised learning with a focus on medical language models. These approaches are well-established and have shown success in similar applications.
    • Data availability is crucial. Access to a large, diverse corpus of EHR data is essential for training the model effectively. Data privacy and anonymization protocols need to be robust.

    Viability:

    • Moderate: The project has the potential for significant commercial success by attracting healthcare providers, researchers, and patients.
    • Challenges include regulatory hurdles in the medical field and the need for rigorous validation of the model's predictions. Gaining trust from healthcare professionals is critical.

    Desirability:

    • High: The ability to predict future medical events and personalize care plans is highly desirable for both healthcare providers and patients.
    • The multilingual explanation module caters to a broader audience and improves patient education.

    Usefulness:

    • High: Early detection and prevention of potential health problems can significantly improve patient outcomes and reduce healthcare costs.
    • NPP can serve as a foundation for developing other AI applications for healthcare, further enhancing its usefulness.

    But the project should consider:

    • The ability to simulate different scenarios for risk assessment and education is a valuable feature.
    • Addressing the limitations of current AI models that rely primarily on structured data strengthens the project's value proposition.

  • 0
    user-icon
    Aokishi
    May 1, 2024 | 2:26 AM

    Overall

    3

    • Feasibility 2
    • Viability 1
    • Desirabilty 5
    • Usefulness 5
    A potential project has not proven its feasibility

    The Next Prescription Predictor is a project that has a positive impact not only on the medical industry but also attracts many users to the SNET ecosystem. NPP positions itself as a foundational medical model. With the ability to predict future medical events for patients based on health records, NPP enables the development of many AI applications suitable for the healthcare sector. However, there is an issue the team needs to clarify. The team claims there are no available resources or data to utilize for this project. I wonder how the team can access large EHR text databases such as clinical notes to train specialized language models. Data such as clinical notes are also not easily accessible. I have not seen the team offer any solution to access text data such as clinical notes. Besides, training the model requires an extremely large amount of data. The team's profile also does not show that they are experts with authority to access healthcare industry data.

    user-icon
    Ammar Khairi
    May 1, 2024 | 4:36 PM
    Project Owner

    Hi Aokishi, Thanks a lot for taking the time to review our proposal, it is appreciated !

    Some information was missing from the probasals as we were having issuess with 'long description' section. It's all there now but here's a quick summry regarding your points.

    You've raised a crucial point about accessing data, and we totally understand the challenge. However, we've been actively pursuing various avenues to obtain the necessary data for our project. In fact, we already have around 3GB of clinical notes in our possession. Basically we look for public data resources, synthetic datasets from platforms like Kaggle and data augmentation. 

    Our team does have expertise in the medical field, which is were we encounterd the challenge of medical foresigth and developed the idea behind NPP.  However, as you pointed out, specilaist  onboard would greatly benefit our project. That's why we've allocated part of our budget to include a contract for a junior pathologist.

     

Summary

Overall Community

3.7

from 7 reviews
  • 5
    1
  • 4
    3
  • 3
    3
  • 2
    0
  • 1
    0

Feasibility

3.4

from 7 reviews

Viability

3.1

from 7 reviews

Desirabilty

4

from 7 reviews

Usefulness

4.1

from 7 reviews