Yeabsira Derese
Project OwnerTeam Lead
This proposal aims to develop a comprehensive and versatile MeTTa code corpus that serves as a critical resource for fine-tuning and training large language models (LLMs) within the Hyperon ecosystem. The corpus will encompass diverse examples encompassing MeTTa specific features, algorithmic implementations, and problem-solving scenarios, ensuring compatibility with LLM training needs while showcasing MeTTa's advantages in addressing complex reasoning tasks.
Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.
In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.
In this initial phase the primary task will be to identify relevant repositories and sources that can provide the necessary data for the corpus. This will involve searching through public GitHub repositories documentation and other available resources. Additionally scraping scripts will be developed or refined to automate the process of data extraction from these repositories. The team will also begin executing the automated mining and reimplementation tasks ensuring that quality data is collected. This phase will also focus on drafting problem-specific scenarios and common error handling cases that reflect the language's features and paradigms. Finally algorithmic and data structure problems will be selected to ensure comprehensive coverage of MeTTa’s capabilities.
A collection of repositories and data sources working scraping scripts initial sets of problem-specific scenarios error handling examples and a list of algorithmic problems selected for inclusion.
$7,000 USD
Successful identification of at least five high-quality repositories, completion of functional scraping scripts, and the creation of at least 100 problem-specific scenarios, algorithm solution pairs and error handling examples.
This milestone involves structuring and annotating the collected examples to prepare them for integration into the corpus. The collected MeTTa code will be organized by category with detailed annotations explaining the logic and context of the code. Additionally the team will prepare MeTTa code examples by reimplementing code from other programming languages such as Haskell Prolog and Lisp to provide more variety and robustness to the corpus. This will involve carefully rewriting the selected code snippets into MeTTa while maintaining accuracy and functionality.
A structured and annotated collection of MeTTa code examples and reimplemented code repositories from other programming languages.
$12,000 USD
Completion of at least 5000 annotated examples, with successful reimplementations of code from haskel, prolog and lisp, ensuring the examples are accurate, well-documented, and properly categorized.
The third phase will focus on the preparation of scenario-based and error-handling MeTTa code which will be essential for ensuring that the corpus represents real-world use cases and challenges. This includes writing solutions for algorithmic problems in MeTTa and focusing on areas such as pattern matching knowledge representation and error handling. Furthermore a script will be developed to organize the MeTTa files into a common format that aligns with the corpus standards. Using this script all previously prepared MeTTa code will be organized and standardized into a uniform format facilitating easy access and integration into the training process.
A set of scenario-based MeTTa code algorithmic problem solutions and a script to organize and standardize the MeTTa code.
$10,000 USD
Completion of at least 100 scenario-based solutions and algorithmic problem solutions, as well as the successful development and execution of the script that organizes the corpus into the required format along with the additional 3000 data entries to the corpus.
In the final phase comprehensive documentation will be written to detail the corpus creation process including the steps taken for data collection organization and annotation. The documentation will also include guidelines for extending the corpus with additional data or scenarios in the future. The corpus will be validated by test users to ensure its accuracy and usability. A final review of the deliverables will be conducted to ensure consistency and quality across all materials. Upon validation and quality checks the finalized corpus along with the documentation will be submitted.
A complete and finalized corpus with full documentation including validation feedback and any revisions based on test user input.
$6,000 USD
Successful submission of the finalized corpus with 10000 instruction-code pair data entries
Reviews & Ratings
Please create account or login to write a review and rate.
Check back later by refreshing the page.
© 2025 Deep Funding
Join the Discussion (0)
Please create account or login to post comments.