Milestone & Budget
Milestone 1:
Goal: Early development of the sentiment analysis tool and initial categorization.
Deliverables:
1. Functional API interfaces equipped with test data.
2. Curated dataset for initial testing.
Budget Breakdown:
- API development, model development, data collection & processing: $7,000
- Computational costs for model training: $3,000
- Provisioning of machines: $2,650
- Marketing Campaigns (3 months): $3,000
Estimated total hours: ~400 hours
Total Milestone 1 Cost: $15,650
Milestone 2:
Goal: Onboarding of model capable of binary categorization on the SNET marketplace.
Deliverables: Deployment of the model on the SNET marketplace.
Estimated total hours: ~200 hours
Total Milestone 2 Cost: $6,500 (10% of the proposal)
Milestone 3:
Goal: Model enrichment, incorporation of additional sentiments, and dataset expansion.
Deliverables:
1. Enlarged dataset with more category-specific data.
2. Retrained model accommodating additional sentiments.
3. Integration of the Boring and Anger sentiments into the model.
4. Analysis and introduction of the next set of emotions.
Budget Breakdown:
- Data enrichment & augmentation: $3,000
- Model retraining with new sentiments: $4,000
- Computational costs for model refinement: $3,000
- Provisioning of machines: $1,800
- Marketing Campaigns (4 months): $3,000
Estimated total hours: ~500
Total Milestone 3 Cost: $14,800
Milestone 4:
Goal: Model finalization, ensuring a comprehensive cover of sentiments.
Deliverables:
1. Model training and integration for the remaining 2 or 3 sentiments.
2. A detailed performance report outlining the model's efficiency across all sentiments.
Budget Breakdown:
- Model training for final sentiments: $4,500
- Computational costs for final model training: $2,500
- Provisioning of machines: $1,800
- Marketing Campaigns (3 months): $3,000
Estimated total hours: ~300
Total Milestone 4 Cost: $11,800
Milestone 5: Hosting/API calls
Goal: Support costs related to cloud hosting and users trial.
Estimated cost: $16,250 (25% of the proposal)
Total for All Milestones: $65,000 *It may be necessary to hire a developer and a linguist consultant in order to achieve these milestones.
Long Description
Company Name
Celeste AI
São Paulo, Brazil
Summary
Celeste AI is a groundbreaking startup that offers advanced transcription services based on speech to text recognition. Solutions in our pipeline include AI-powered tools and audio analytics, transforming voice files into data-rich sources.
Celeste seeks to bridge the language gap for AI models development in countries with Latin descendant languages, most of which are in Latin America Region.
Since our beta launch on July 1, we've onboarded over 1,000 users and have successfully completed our first sales. We are being accelerated by the prestigious
- São José dos Campos (Nexus), an arm of Brazil's Ministry of Science and Technology.
We believe that AI solutions are driving a wealth transfer in the world, with Latin descendant countries lagging behind in the AI field due to English being the dominant language for training models.
With our models, we seek to automatically transcribe audio and video files, provide detailed analytics, and uncover trends and patterns in spoken content. Celeste is seeking grant funding to support the development and training of one of these models, which will bridge the language gap for AI-driven transcription and analytics in Portuguese.
We believe our models' development are strongly aligned with SingularityNET's purpose of driving the development of benevolent, democratic and inclusive Artificial General Intelligence.
Funding Amount
$65,000
The Problem to be Solved
The global AI landscape has seen unprecedented growth over the last decade. However, this growth has been disproportionately oriented towards dominant languages like English. Latin-descended languages, including Portuguese, have largely been underserved in terms of AI solutions tailored to them. As a result, businesses, researchers, and developers who rely on accurate sentiment analysis for Portuguese content often find themselves lacking the tools they need. This gap in the market not only hinders the potential of AI to serve diverse linguistic groups but also curtails the holistic growth and decentralization of AI technologies.
Our Solution
Sentiment Analysis model
Our Sentiment Analysis model is tailored for Portuguese, ensuring accuracy and cultural relevance. It can empower developers and aid researchers in designing projects specifically to the Portuguese-speaking population, thus addressing a significant market void. This tool is intended to promote linguistic inclusivity, drive holistic growth, and decentralize the AI landscape.
We envision our sentiment analysis API being applied by business and developers for improving sales and human resources processes; analyzing user feedback, market research, and social media monitoring, among other never-before imagined needs as the technology evolves.
Sentiment Analysis API Proposal:
-
Develop an AI model targeting sentiment analysis in audio/video files (not in text) analyzing the tone, amplitude, noise and interruptions of the speech by leveraging Automatic Speech Recognition (ASR) capabilities to provide a multi-label categorization of the speech.
-
Achieve an accuracy rate (F1-Score) above 75% in identifying speech categories.
-
Deploy models as an API in the SigularityNET ecosystem.
Our goal is to dive deep into different layers:
- Polarity-based Analysis: Understand if the sentiment is positive, negative, or neutral.
- Emotion Detection: Explore specific emotions like joy, anger, sadness, happiness, optimism, and more.
- Aspect-Based Analysis: Determine sentiments related to particular aspects or features of a product or service.
- Intent Detection: Gauge user intentions such as purchase intent or churn signals.
Marketing Strategy
Target Audience:
-
Business: Enterprises seeking customer feedback analysis.
-
Developers: Portuguese speaker developers emphasizing AI and sentiment analysis.
-
Business Professionals: Individuals in HR, sales, market research, and media working in Portuguese-speaking countries.
Platforms: Google Ads, LinkedIn Ads, Meta Ads, Twitter & GitHub.
Activities: Blog posts, community webinars, presence on social media, workshops and trials with universities and tech communities. Social media marketing campaigns to generate leads and sign up trial users.
Our Project Milestones and Cost Breakdown
Milestone 1:
Goal: Early development of the sentiment analysis tool and initial categorization.
Deliverables:
1. Functional API interfaces equipped with test data.
2. Curated dataset for initial testing.
Budget Breakdown:
- API development, model development, data collection & processing: $7,000
- Computational costs for model training: $3,000
- Provisioning of machines: $2,650
- Marketing Campaigns (3 months): $3,000
Estimated total hours: ~400 hours
Total Milestone 1 Cost: $15,650
Milestone 2:
Goal: Onboarding of model capable of binary categorization on the SNET marketplace.
Deliverables: Deployment of the model on the SNET marketplace.
Estimated total hours: ~200 hours
Total Milestone 2 Cost: $6,500 (10% of the proposal)
Milestone 3:
Goal: Model enrichment, incorporation of additional sentiments, and dataset expansion.
Deliverables:
1. Enlarged dataset with more category-specific data.
2. Retrained model accommodating additional sentiments.
3. Integration of the Boring and Anger sentiments into the model.
4. Analysis and introduction of the next set of emotions.
Budget Breakdown:
- Data enrichment & augmentation: $3,000
- Model retraining with new sentiments: $4,000
- Computational costs for model refinement: $3,000
- Provisioning of machines: $1,800
- Marketing Campaigns (4 months): $3,000
Estimated total hours: ~500
Total Milestone 3 Cost: $14,800
Milestone 4:
Goal: Model finalization, ensuring a comprehensive cover of sentiments.
Deliverables:
1. Model training and integration for the remaining 2 or 3 sentiments.
2. A detailed performance report outlining the model's efficiency across all sentiments.
Budget Breakdown:
- Model training for final sentiments: $4,500
- Computational costs for final model training: $2,500
- Provisioning of machines: $1,800
- Marketing Campaigns (3 months): $3,000
Estimated total hours: ~300
Total Milestone 4 Cost: $11,800
Milestone 5: Hosting/API calls
Goal: Support costs related to cloud hosting and users trial.
Estimated cost: $16,250 (25% of the proposal)
Total for All Milestones: $65,000 *It may be necessary to hire a developer and a linguist consultant in order to achieve these milestones.
Risk and Mitigation
1. Risk: Bias in the Training Data
If the data used to train the sentiment analysis model is biased, the model's predictions could be skewed and not representative of diverse real-world situations.
Mitigation:
- Use a diverse and comprehensive dataset that captures various sentiments across different contexts and demographics.
- Employ techniques like data augmentation to create synthetic training examples and diversify the training set.
- Continuously review and audit model predictions and refine the training dataset accordingly.
2. Risk: Overfitting
The model might perform exceptionally well on the training data but fail to generalize on unseen data.
Mitigation:
- Utilize techniques like cross-validation to make sure the model's performance is consistent across different subsets of the data.
- Employ regularization techniques.
- Use dropout in neural network architectures.
3. Risk: Misinterpretation of Complex Sentiments Sentiment is multifaceted and can be subtle; AI models may not be able to capture sarcasm, irony, or mixed emotions.
Mitigation:
- Curate a dataset specifically targeting complex sentiments for training.
- Use ensemble models or hybrid models that combine rule-based and machine learning methods.
- Continuously gather feedback from real-world deployments and refine the model.
4. Risk: High Computational Costs
Training deep learning models for sentiment analysis can be computationally expensive.
Mitigation:
- Explore transfer learning, where a pre-trained model is fine-tuned for the specific sentiment analysis task, reducing the training time and resources.
- Utilize cloud-based services that offer scalable computational resources.
5. Risk: Data Privacy Concerns
Using real-world data might breach privacy regulations if not handled properly.
Mitigation:
- Ensure all training data is anonymized and stripped of personally identifiable information.
- Use synthetic or simulated data where possible.
- Comply with local data protection regulations, such as Brazil's LGPD.
Voluntary Revenue
We will onboard the service on the platform. If the service crosses the threshold of $1000, revenue per month, 5% of the additional revenue will be fed back into the SNET/DeepFunding wallets. This condition will remain valid for 5 years after first onboarding the service and will be applicable to this service or any subsequent iteration of this service on the platform.
Open Source
Celeste is not open sourced yet, but we intend to open source our APIs as we develop our own AI models, supporting the expansion of AI technology in Portuguese, Spanish, Italian, French and Romanian languages.
Our Team
Marcos Lima: co-founder, +15 years as software engineer, scrum master, working with solutions architecture. Postgraduate studies in Big Data and Complex Data Mining at Universidade Estadual de Campinas.
Eder Rosa: co-founder, +15 years as software engineer working with web and mobile solutions architecture. Computer Science Degree. NodeJs, React, React Native, Typescript, AWS.
Artur Rosa: co-founder, +7 year as full stack developer working with web systems, rest APIs, and databases. Information Systems degree. Python, Angular JS, TypeScript, React JS, Node JS.
Ana Paula Pereira: co-founder, +13 years as financial journalist and corporate communications. Degrees in both journalism and economics. +8 years of experience in team coordination and project management.
Vinicius Abreu: full-stack developer, +3 years working w/ C#, Delphi, React, Node.js, REST, MySQL, PostgreSQL. Software engineer degree. Tech Resident in Data Science and Python at PUC-Campinas.
Thalita Peres: +12 years in business development, serving as sales manager for publishing companies.
Beatriz Gimenez: +10 years working with performance marketing, digital content, marketing strategies.
Paula Rocha: +10 years in corporate communications, crisis management, media training, and PR strategies.
Isabella Velleda: +2 years working on startups operations. Provides comms and operations support. Master of Business and Administration at Universidade de São Paulo.
Q&A
Learn more about our proposal based on SNET's community concerns:
Data Privacy - Legal compliance is one of our top priorities, especially in Brazil, where consumer data privacy regulations have been implemented over the past years. We have a team of three lawyers with expertise in data privacy, LGPD, taxation, and blockchain technology assisting our operations. We believe our counsel team is able to provide adequate support for continuous compliance as AI regulations evolve. Additionally, all information will be used exclusively to set up the parameters for training the models. The training will be done in a restricted environment with data that is already anonymized (by the platform by default). Our platform was created with privacy at its core, so the chances to leak any sensitive information are really low.
Potential Bias - Our platform itself is a datasource of information since users can review their transcription content, meaning our learning process includes human reinforcement as a core component, therefore improving model training. We also plan to use journalistic data sources, such as data, audio, text and videos that can serve as a complementary resource of pre-classified data. We have two journalists on the team to provide technical support with the process. Further, the model can be optimized through cross-validation with text-based sentiment analysis from transcriptions, which usually offers a broader level of accuracy on sentiment analysis.
Cultural Relevance - For this sentiment analysis model, we would work with a diverse and segmented database, including different "variants" of Portuguese, covering regional and cultural expressions/nuances. We believe community feedback is also crucial to keep the model culturally relevant and accurate, so we'd like to incorporate feedback from our users and the SNET community. We just went live in beta 8 weeks ago (with over 2k users so far), so we want to foster community feedback not only for refining our models, but for marketing engagement as well.
Related Links
www.celeste-ai.com
Sort by