![team-member](https://deepfunding.ai/wp-content/uploads/2024/02/4bb8e31e03818ef9d338e64e23f6a666_deepfund-1.png)
Anthony Oliko
Project OwnerOversee the project and ensure alignment with objectives and timelines. Design and implement scripts/tools. Manage the generation of synthetic data to supplement existing resources. Documentation
This proposal aims to create a high-quality MeTTa corpus designed to train or fine-tune a natural language-to-MeTTa language model (LLM). The corpus will feature up to 10,000 diverse instruction-code pairs, serving as a foundation for building an AI-powered coding assistant. Along with the corpus, we’ll deliver all necessary scripts and clear documentation to ensure transparency and reproducibility. This project is set to make MeTTa more accessible, simplify its learning curve, and contribute to AGI research within the Hyperon framework. Our team is made up of passionate members from the SingularityNET community, all deeply interested in the OpenCog Hyperon framework and its potential.
Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.
In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.
The project begins with gathering and reviewing all available MeTTa resources including official documentation community contributions GitHub repositories and tutorials. This phase will also involve evaluating these resources for their quality relevance and coverage of MeTTa features. Additionally a framework for data collection and formatting will be defined along with establishing validation criteria. This phase ensures a structured foundation for building the corpus. Time: Month 1
1. A comprehensive list of MeTTa resources categorized by type and relevance. 2. A detailed plan outlining the methods for data extraction formatting and validation. 3. Initial drafts of scripts and tools for automated resource extraction where applicable. 4. Weekly progress updates and a milestone completion report.
$5,000 USD
1. Resource Compilation: A complete and well-organized list of MeTTa resources is compiled, categorized by type (e.g., documentation, tutorials, repositories) and assessed for relevance and coverage of key MeTTa features. 2. Framework Definition: A clear and actionable plan is developed, outlining the methods for data extraction, formatting, and validation, including measurable validation criteria. 3. Tool Development: Initial versions of scripts and tools for automated resource extraction are created, tested for functionality, and meet the requirements for scalability and accuracy. 4. Progress Tracking: Weekly updates are provided, detailing completed tasks, challenges encountered, and planned actions, ensuring transparency and accountability throughout the milestone. 5. Milestone Report: A comprehensive milestone completion report is delivered, summarizing achievements, insights, and any modifications to the project plan, demonstrating readiness to proceed to the next phase.
This phase focuses on transforming raw resources into usable data. The extracted material will be formatted into instruction-output pairs with gaps addressed through the generation of synthetic examples. Coverage will include all key features and functionalities of MeTTa. The emphasis will be on creating diverse accurate and comprehensive data. Time: Month 2
1. A structured dataset of 5000 validated instruction-output pairs. 2. New synthetic examples to fill coverage gaps. 3. Scripts for data formatting generation and validation. 4. Midpoint evaluation report to ensure quality and alignment with project goals.
$12,000 USD
1. Dataset Creation: A structured dataset of 5,000 validated instruction-output pairs is developed, demonstrating accuracy, diversity, and alignment with MeTTa’s core features and functionalities. 2. Synthetic Data Generation: New synthetic examples are created to address identified coverage gaps, ensuring comprehensive representation of MeTTa's capabilities. 3. Tool Availability: Fully functional scripts for data formatting, synthetic generation, and validation are delivered, with thorough testing to confirm usability and reliability. 4. Quality Assurance: All data passes validation checks based on predefined criteria, ensuring high standards of correctness and relevance. 5. Midpoint Evaluation: A detailed evaluation report is submitted, assessing progress, highlighting achievements, identifying potential risks, and confirming alignment with project goals.
The primary task here is to complete the corpus by expanding it to 10000 validated pairs. Rigorous quality assurance processes will be implemented to ensure that the corpus meets the defined standards for accuracy diversity and usability. Feedback from stakeholders will be incorporated during this phase. Time: Month 3
1. A finalized corpus with 10000 high-quality validated instruction-output pairs. 2. Comprehensive quality assurance reports detailing validation processes and outcomes. 3. Updated scripts/tools for corpus refinement and replication. 4. Weekly progress updates and milestone completion report.
$8,000 USD
1. Corpus Completion: A finalized corpus of 10,000 high-quality, validated instruction-output pairs is delivered, meeting predefined standards for accuracy, diversity, and usability. 2. Quality Assurance: Comprehensive quality assurance processes are conducted, with detailed reports documenting validation criteria, methodologies, and outcomes, ensuring the dataset's reliability and robustness. 3. Tool Updates: Scripts and tools for corpus refinement and replication are updated and optimized for efficiency and scalability, with clear documentation for future use. 4. Stakeholder Feedback: Feedback from stakeholders is effectively integrated into the final corpus, addressing any concerns or suggestions to enhance its utility and relevance. 5. Progress Transparency: Weekly updates are provided, tracking milestones and ensuring alignment with project objectives, culminating in a thorough milestone completion report.
The final phase focuses on preparing and delivering the project outputs. Comprehensive documentation will be created covering the corpus creation process validation methods known limitations and guidelines for future use. All scripts tools and data will be organized tested and delivered as open-source resources. Time: Month 4
1. Full documentation detailing the project including data sources methods and use instructions. 2. All scripts and tools necessary for replicating or extending the corpus creation process. 3. Finalized corpus shared as an open-source deliverable. 4. Presentation to stakeholders summarizing project outcomes. 5. Final project report summarizing milestones challenges and future recommendations.
$5,000 USD
1. Comprehensive Documentation: Detailed, user-friendly documentation is delivered, clearly describing data sources, corpus creation methods, validation processes, known limitations, and guidelines for replication or extension. 2. Script and Tool Delivery: All scripts and tools required for reproducing or extending the corpus creation process are finalized, thoroughly tested, and organized, ensuring they are functional and accessible as open-source resources. 3. Corpus Accessibility: The finalized corpus is published and made available as an open-source deliverable, meeting all requirements for usability and proper licensing. 4. Stakeholder Presentation: A well-structured presentation is conducted, effectively summarizing project goals, achievements, challenges, and the practical utility of deliverables. 5. Final Project Report: A comprehensive final report is submitted, detailing milestones achieved, obstacles encountered, solutions implemented, and recommendations for future work.
Reviews & Ratings
Please create account or login to write a review and rate.
Check back later by refreshing the page.
© 2025 Deep Funding
Join the Discussion (0)
Please create account or login to post comments.