Aid for Speech Impairment using AI (ASAI)

chevron-icon
Back
Top
chevron-icon
project-presentation-img
khasyahfr
Project Owner

Aid for Speech Impairment using AI (ASAI)

Expert Rating

n/a
  • Proposal for BGI Nexus 1
  • Funding Request $50,000 USD
  • Funding Pools Beneficial AI Solutions
  • Total 4 Milestones

Overview

Aid for Speech Impairment using AI (ASAI) is an AI service designed to improve communication for individuals with speech impairments, particularly those affected by dysarthria, a motor speech disorder characterized by slow or slurred speech. Unlike traditional speech recognition systems, which often fail to accurately process impaired speech, ASAI leverages machine learning models to translate audio from these individuals into clear, readable text. By integrating ASAI into SingularityNET's AI marketplace, we provide a highly specialized service that addresses an underserved community while expanding the platform's inclusive offerings which aligns with BGI’s mission of promoting social good.

Proposal Description

How Our Project Will Contribute To The Growth Of The Decentralized AI Platform

ASAI supports BGI's mission by addressing a social challenge, communication barriers faced by individuals with speech impairments, particularly those with dysarthria. By providing a practical, accessible tool for clear communication, ASAI empowers individuals to express themselves with ease and confidence, enhancing their social participation and quality of life. Through SingularityNET, ASAI contributes to BGI’s goal of driving social good by improving the lives of underserved communities.

Our Team

Our team is suited to successfully build ASAI. Khasyah, a professional software engineer, was successfully funded in Deep Funding Round 4. Pandu, a professional data scientist and machine learning engineer, brings years of experience in building AI systems. Together, we’ve already identified the key dataset and model architecture for the project and have a clear implementation plan in place. With our lean team structure, we can maintain focus and deliver fast, efficient development.

AI services (New or Existing)

Dysarthric Speech Recognition AI

Type

New AI service

Purpose

To convert speech from individuals with dysarthria into clear readable text. The service enables those with speech impairments to communicate effectively by providing real-time accurate transcription of their speech overcoming the limitations of traditional speech recognition systems.

AI inputs

Audio recordings or real-time audio streams of dysarthric speech typically containing slow or slurred pronunciations due to speech impairments. The format can be WAV audio file extension.

AI outputs

Transcribed text that accurately reflects the intended speech content with reduced errors in recognition even when the speech is distorted. The output is provided in a text format for use in communication interfaces messaging or other applications.

The core problem we are aiming to solve

Communication for individuals with speech impairments, particularly dysarthria, remains a challenge due to the limitations of existing technology. Dysarthria distorts speech, making it slurred and difficult to understand, which leaves affected individuals isolated. Current speech recognition systems are ineffective, as they are built on models designed to handle clear speech, not the one presented by speech impairments. With over 170 million people facing speech impairment difficulties, this is a problem that urgently demands attention. However, recent advancements in AI make it possible to address this gap, providing a solution that has the potential to transform the lives of millions.

Our specific solution to this problem

Our solution addresses the problem of speech recognition for individuals with dysarthria by developing a specialized AI model trained on a diverse, dysarthric-specific dataset. By collecting and augmenting speech data from various sources, such as the TORGO and UASpeech datasets, we ensure the model learns from a wide range of speech impairments, including different severities, and genders. We preprocess the data by extracting audio features, such as MFCCs and spectrograms, which represent the key patterns of dysarthric speech. Our model is trained using Connectionist Temporal Classification (CTC) loss, a method specifically designed for aligning difficult speech-to-text sequences. By optimizing the model's performance through hyperparameter tuning and evaluating its accuracy using the Word Error Rate (WER), we ensure it can handle distorted speech more effectively than standard systems. Finally, by deploying this model on SingularityNET, we create a scalable service that reliably converts impaired speech into clear text, giving users with speech impairments a practical tool for everyday communication.

 

Given the rapid advancements in AI and machine learning, we remain open to adjusting the model or implementation approach as needed, while always prioritizing the delivery of high-quality output for the AI service.

Open Source Licensing

Apache License

Was there any event, initiative or publication that motivated you to register/submit this proposal?

select_option

Proposal Video

Placeholder for Spotlight Day Pitch-presentations. Video's will be added by the DF team when available.

  • Total Milestones

    4

  • Total Budget

    $50,000 USD

  • Last Updated

    24 Feb 2025

Milestone 1 - Dataset Preparation

Description

This milestone is focused on gathering and preparing the dataset necessary for training the dysarthric speech recognition model. The goal is to ensure that the dataset contains a sufficient number of speech samples paired with corresponding text transcriptions reflecting a wide variety of dysarthric speech patterns. This includes variations across gender speech distortion and severity levels. Data augmentation techniques will be used to simulate dysarthric speech patterns and add variability to the dataset. The diversity and quantity of the dataset will ensure robust training for the model.

Deliverables

- Searching and collecting datasets from various sources such as TORGO and UASpeech. - Verify proper consent and licensing for all collected datasets, ensuring ethical data usage - Applying data augmentation techniques such as changing pitch adding white noise or babble noise and applying speed perturbation (speeding up or slowing down the audio). - Combining multiple datasets (if necessary) to ensure diversity across gender speech distortions and severity levels.

Budget

$15,000 USD

Success Criterion

- The dataset contains a minimum of 50k recordings paired with their corresponding transcriptions. - The dataset reflects diversity in terms of gender, severity, and speech distortion. - Data augmentation techniques are successfully applied to enhance the dataset and simulate dysarthric speech patterns.

Milestone 2 - Preprocessing

Description

This milestone involves preparing the collected speech data for use in the training of the speech recognition model. The focus will be on extracting key audio features that can effectively represent dysarthric speech while applying techniques to enhance the quality of the data. Feature extraction will include common methods like Mel-Frequency Cepstral Coefficients (MFCCs) and/or log-mel spectrograms. Noise reduction and audio enhancement techniques will be used to clean up the data followed by normalization or standardization to make the features suitable for machine learning algorithms. Finally the dataset will be split into training validation and test sets.

Deliverables

- Extracting key audio features from the dataset such as MFCCs and log-mel spectrograms - Applying noise reduction techniques such as Wiener filtering and spectral subtraction to improve the quality of the audio. - Normalizing or standardizing the extracted features using methods like Z-score normalization or Min-Max scaling. - Implement secure data handling practices during feature extraction and preprocessing stages - Splitting the dataset into training (70%) validation (15%) and test (15%) sets to ensure proper evaluation during model development.

Budget

$20,000 USD

Success Criterion

- All relevant audio features (MFCCs, log-mel spectrogram, etc) are extracted successfully and stored in a usable format. - The audio data is cleaned and enhanced through noise reduction and normalization techniques. - The dataset is split into training, validation, and test sets without any bias, ensuring a fair evaluation of the model.

Milestone 3 - Training

Description

The goal of this milestone is to train Automatic Speech Recognition model on the preprocessed dataset. The model will use extracted features such as MFCCs and spectrograms and the corresponding transcriptions to learn the relationship between dysarthric speech and text. The primary loss function will be Connectionist Temporal Classification (CTC) which is designed for sequence-to-sequence tasks like speech recognition. We will iterate on hyperparameters to optimize the model's performance with the primary evaluation metric being Word Error Rate (WER). This milestone will involve model fine-tuning to ensure the highest possible accuracy.

Deliverables

- Input the preprocessed features (MFCCs log-mel spectrograms) along with the corresponding text transcriptions into the model. - Use Connectionist Temporal Classification (CTC) loss function for alignment between predicted and ground-truth transcriptions. - Perform hyperparameter optimization adjusting parameters like learning rate to improve the model’s performance. - Monitor training process to ensure consistent performance across different demographic groups in the dataset - Evaluate the model using Word Error Rate (WER) and fine-tune the parameters until the model achieves optimal performance.

Budget

$10,000 USD

Success Criterion

- The model demonstrates good generalization across the training, validation, and test sets. - A comprehensive report documenting the training process, including hyperparameter settings, model performance, and results.

Milestone 4 - Safety Review and Platform Deployment

Description

This milestone involves implementing core safety and ethical considerations, followed by deploying the trained dysarthric speech recognition model to the SingularityNET platform. The goal is to ensure the service adheres to fundamental ethical principles, particularly regarding data privacy and inclusion, while delivering reliable speech recognition capabilities through the SingularityNET platform. Any issues identified during testing will be addressed and resolved, ensuring that the service is fully operational and accessible to users on the platform.

Deliverables

- Implement zero-retention data handling: 1. Ensure no storage of user audio or transcribed data 2. Configure service for immediate data disposal after processing - Verify diverse representation in training dataset across demographics - Set up the necessary configurations for the dysarthric speech recognition service within SingularityNET. - Deploy the service to the SingularityNET platform, making it available to users. - Conduct comprehensive testing on SingularityNET, verifying the functionality, performance, and reliability of the service. - Resolve any issues identified during testing to ensure smooth operation.

Budget

$5,000 USD

Success Criterion

- The service is successfully deployed and accessible to users on the SingularityNET platform. - The service passes all functional and performance tests, meeting the required standards for reliability and performance on the platform.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

feedback_icon