Anthony Oliko
Project OwnerOversee the project and ensure alignment with objectives and timelines. Design and implement scripts/tools. Manage the generation of synthetic data to supplement existing resources. Documentation
This proposal is to create a high-quality MeTTa corpus for training or fine-tuning a natural language-to-MeTTa language model (LLM). The corpus will consist of up to 10,000 diverse instruction-code pairs, aiding in the development of an AI-powered coding assistant. Deliverables include the corpus, scripts, and documentation, ensuring transparency and reproducibility. This project will accelerate adoption of MeTTa, reduce its learning curve, and support AGI research in the Hyperon framework.
Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.
In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.
The project begins with gathering and reviewing all available MeTTa resources, including official documentation, community contributions, GitHub repositories, and tutorials. This phase will also involve evaluating these resources for their quality, relevance, and coverage of MeTTa features. Additionally, a framework for data collection and formatting will be defined, along with establishing validation criteria. This phase ensures a structured foundation for building the corpus. Time: Month 1
1. A comprehensive list of MeTTa resources categorized by type and relevance. 2. A detailed plan outlining the methods for data extraction, formatting, and validation. 3. Initial drafts of scripts and tools for automated resource extraction where applicable. 4. Weekly progress updates and a milestone completion report.
$5,000 USD
This phase focuses on transforming raw resources into usable data. The extracted material will be formatted into instruction-output pairs, with gaps addressed through the generation of synthetic examples. Coverage will include all key features and functionalities of MeTTa. The emphasis will be on creating diverse, accurate, and comprehensive data. Time: Month 2
1. A structured dataset of 5,000 validated instruction-output pairs. 2. New synthetic examples to fill coverage gaps. 3. Scripts for data formatting, generation, and validation. 4. Midpoint evaluation report to ensure quality and alignment with project goals.
$12,000 USD
The primary task here is to complete the corpus by expanding it to 10,000 validated pairs. Rigorous quality assurance processes will be implemented to ensure that the corpus meets the defined standards for accuracy, diversity, and usability. Feedback from stakeholders will be incorporated during this phase. Time: Month 3
1. A finalized corpus with 10,000 high-quality, validated instruction-output pairs. 2. Comprehensive quality assurance reports detailing validation processes and outcomes. 3. Updated scripts/tools for corpus refinement and replication. 4. Weekly progress updates and milestone completion report.
$8,000 USD
The final phase focuses on preparing and delivering the project outputs. Comprehensive documentation will be created, covering the corpus creation process, validation methods, known limitations, and guidelines for future use. All scripts, tools, and data will be organized, tested, and delivered as open-source resources. Time: Month 4
1. Full documentation detailing the project, including data sources, methods, and use instructions. 2. All scripts and tools necessary for replicating or extending the corpus creation process. 3. Finalized corpus shared as an open-source deliverable. 4. Presentation to stakeholders summarizing project outcomes. 5. Final project report summarizing milestones, challenges, and future recommendations.
$5,000 USD
Reviews & Ratings
Please create account or login to write a review and rate.
Check back later by refreshing the page.
Join the Discussion (0)
Please create account or login to post comments.