Remo Start
Project OwnerAs the project owner, Remostart will be in charge of organising and delivering the project according to the standards defined
The methodology utilized in creating a corpus for an LLM is as important as the quality of the dataset, we propose to create a corpus for MeTTa using the prompt-instruct-response approach to allow the model to learn both the theoretical and practical concepts and aspects of MeTTa programming language.The prompt-instruct-response approach is a technique in the LLM dataset that allows the LLM to learn not only from the response but the instructions and prompt. The implication of this is that any coding assistance trained on such a corpus better understands contexts around each line in the dataset. Where context, efficiency and practicality is of the essence this methodology performs best.
Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.
In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.
Establish the foundational framework for the project by finalizing the methodology, identifying data sources, and developing initial scripts for data extraction and formatting
Project plan and timeline. List of identified MeTTa resources (e.g., tutorials, GitHub repositories, documentation) to be used as data sources. Initial data extraction and formatting scripts.
$7,000 USD
Project plan and timeline approved by the team. At least three MeTTa resources identified and documented. At least ten sample resource of extracted and formated data.
Build the core of the dataset by creating the first 7,000 prompt-instruct-response pairs from existing MeTTa resources and validating their correctness
A corpus containing 7,000 validated prompt-instruct-response pairs. Validation report ensuring correctness of at least 95% of the dataset. Initial documentation describing data sources, methods for extraction, and validation process.
$21,000 USD
Dataset contains 7,000 prompt-instruct-response pairs in the defined JSON structure. Validation confirmation ≥ 95% of outputs are error-free and adhere to MeTTa standards. Initial documentation is reviewed and finalized.
Finalize the dataset by synthesizing new data to complete 10,000 pairs, validate the entire corpus, and produce comprehensive documentation for release
Complete 10,000-pair MeTTa corpus, including synthesized data for underrepresented features. Final validation report ensuring corpus quality and diversity. Fully functional scripts for data generation and validation, released under the MIT License. Comprehensive final documentation, comprehensive final josn file and an optional CSV version of the corpus.
$7,000 USD
Corpus contains exactly 10,000 entries, validated with ≥ 95% correctness. Scripts and dataset are uploaded to a version-controlled repository. Final documentation reviewed and accessible, final json file submitted and verified with an optional CSV file generated and verified(optional if we have time within the 4 months then csv will be added to the jsonl file)
Reviews & Ratings
Please create account or login to write a review and rate.
Check back later by refreshing the page.
© 2025 Deep Funding
Join the Discussion (0)
Please create account or login to post comments.