
khasyahfr
Project OwnerKhasyah, with expertise as a software engineer and previous funded Deep Funding proposer, will be responsible for managing the end-to-end technical development of ASAI.
Aid for Speech Impairment using AI (ASAI) is an AI service designed to improve communication for individuals with speech impairments, particularly those affected by dysarthria, a motor speech disorder characterized by slow or slurred speech. Unlike traditional speech recognition systems, which often fail to accurately process impaired speech, ASAI leverages machine learning models to translate audio from these individuals into clear, readable text. By integrating ASAI into SingularityNET's AI marketplace, we provide a highly specialized service that addresses an underserved community while expanding the platform's inclusive offerings which aligns with BGI’s mission of promoting social good.
New AI service
To convert speech from individuals with dysarthria into clear readable text. The service enables those with speech impairments to communicate effectively by providing real-time accurate transcription of their speech overcoming the limitations of traditional speech recognition systems.
Audio recordings or real-time audio streams of dysarthric speech typically containing slow or slurred pronunciations due to speech impairments. The format can be WAV audio file extension.
Transcribed text that accurately reflects the intended speech content with reduced errors in recognition even when the speech is distorted. The output is provided in a text format for use in communication interfaces messaging or other applications.
This milestone is focused on gathering and preparing the dataset necessary for training the dysarthric speech recognition model. The goal is to ensure that the dataset contains a sufficient number of speech samples paired with corresponding text transcriptions reflecting a wide variety of dysarthric speech patterns. This includes variations across gender speech distortion and severity levels. Data augmentation techniques will be used to simulate dysarthric speech patterns and add variability to the dataset. The diversity and quantity of the dataset will ensure robust training for the model.
- Searching and collecting datasets from various sources such as TORGO and UASpeech. - Verify proper consent and licensing for all collected datasets, ensuring ethical data usage - Applying data augmentation techniques such as changing pitch adding white noise or babble noise and applying speed perturbation (speeding up or slowing down the audio). - Combining multiple datasets (if necessary) to ensure diversity across gender speech distortions and severity levels.
$15,000 USD
- The dataset contains a minimum of 50k recordings paired with their corresponding transcriptions. - The dataset reflects diversity in terms of gender, severity, and speech distortion. - Data augmentation techniques are successfully applied to enhance the dataset and simulate dysarthric speech patterns.
This milestone involves preparing the collected speech data for use in the training of the speech recognition model. The focus will be on extracting key audio features that can effectively represent dysarthric speech while applying techniques to enhance the quality of the data. Feature extraction will include common methods like Mel-Frequency Cepstral Coefficients (MFCCs) and/or log-mel spectrograms. Noise reduction and audio enhancement techniques will be used to clean up the data followed by normalization or standardization to make the features suitable for machine learning algorithms. Finally the dataset will be split into training validation and test sets.
- Extracting key audio features from the dataset such as MFCCs and log-mel spectrograms - Applying noise reduction techniques such as Wiener filtering and spectral subtraction to improve the quality of the audio. - Normalizing or standardizing the extracted features using methods like Z-score normalization or Min-Max scaling. - Implement secure data handling practices during feature extraction and preprocessing stages - Splitting the dataset into training (70%) validation (15%) and test (15%) sets to ensure proper evaluation during model development.
$20,000 USD
- All relevant audio features (MFCCs, log-mel spectrogram, etc) are extracted successfully and stored in a usable format. - The audio data is cleaned and enhanced through noise reduction and normalization techniques. - The dataset is split into training, validation, and test sets without any bias, ensuring a fair evaluation of the model.
The goal of this milestone is to train Automatic Speech Recognition model on the preprocessed dataset. The model will use extracted features such as MFCCs and spectrograms and the corresponding transcriptions to learn the relationship between dysarthric speech and text. The primary loss function will be Connectionist Temporal Classification (CTC) which is designed for sequence-to-sequence tasks like speech recognition. We will iterate on hyperparameters to optimize the model's performance with the primary evaluation metric being Word Error Rate (WER). This milestone will involve model fine-tuning to ensure the highest possible accuracy.
- Input the preprocessed features (MFCCs log-mel spectrograms) along with the corresponding text transcriptions into the model. - Use Connectionist Temporal Classification (CTC) loss function for alignment between predicted and ground-truth transcriptions. - Perform hyperparameter optimization adjusting parameters like learning rate to improve the model’s performance. - Monitor training process to ensure consistent performance across different demographic groups in the dataset - Evaluate the model using Word Error Rate (WER) and fine-tune the parameters until the model achieves optimal performance.
$10,000 USD
- The model demonstrates good generalization across the training, validation, and test sets. - A comprehensive report documenting the training process, including hyperparameter settings, model performance, and results.
This milestone involves implementing core safety and ethical considerations, followed by deploying the trained dysarthric speech recognition model to the SingularityNET platform. The goal is to ensure the service adheres to fundamental ethical principles, particularly regarding data privacy and inclusion, while delivering reliable speech recognition capabilities through the SingularityNET platform. Any issues identified during testing will be addressed and resolved, ensuring that the service is fully operational and accessible to users on the platform.
- Implement zero-retention data handling: 1. Ensure no storage of user audio or transcribed data 2. Configure service for immediate data disposal after processing - Verify diverse representation in training dataset across demographics - Set up the necessary configurations for the dysarthric speech recognition service within SingularityNET. - Deploy the service to the SingularityNET platform, making it available to users. - Conduct comprehensive testing on SingularityNET, verifying the functionality, performance, and reliability of the service. - Resolve any issues identified during testing to ensure smooth operation.
$5,000 USD
- The service is successfully deployed and accessible to users on the SingularityNET platform. - The service passes all functional and performance tests, meeting the required standards for reliability and performance on the platform.
Reviews & Ratings
Please create account or login to write a review and rate.
Check back later by refreshing the page.
© 2024 Deep Funding
Join the Discussion (0)
Please create account or login to post comments.