Create Quality corpus for NL-to-MeTTa LLM

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Yeabsira Derese
Project Owner

Create Quality corpus for NL-to-MeTTa LLM

Expert Rating

n/a

Overview

This proposal aims to develop a comprehensive and versatile MeTTa code corpus that serves as a critical resource for fine-tuning and training large language models (LLMs) within the Hyperon ecosystem. The corpus will encompass diverse examples encompassing MeTTa specific features, algorithmic implementations, and problem-solving scenarios, ensuring compatibility with LLM training needs while showcasing MeTTa's advantages in addressing complex reasoning tasks.

RFP Guidelines

Create corpus for NL-to-MeTTa LLM

Internal Proposal Review
  • Type SingularityNET RFP
  • Total RFP Funding $70,000 USD
  • Proposals 10
  • Awarded Projects n/a
author-img
SingularityNET
Aug. 13, 2024

Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.

Proposal Description

Proposal Details Locked…

In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    4

  • Total Budget

    $35,000 USD

  • Last Updated

    7 Dec 2024

Milestone 1 - Planning & Data Collection

Description

In this initial phase the primary task will be to identify relevant repositories and sources that can provide the necessary data for the corpus. This will involve searching through public GitHub repositories documentation and other available resources. Additionally scraping scripts will be developed or refined to automate the process of data extraction from these repositories. The team will also begin executing the automated mining and reimplementation tasks ensuring that quality data is collected. This phase will also focus on drafting problem-specific scenarios and common error handling cases that reflect the language's features and paradigms. Finally algorithmic and data structure problems will be selected to ensure comprehensive coverage of MeTTa’s capabilities.

Deliverables

A collection of repositories and data sources working scraping scripts initial sets of problem-specific scenarios error handling examples and a list of algorithmic problems selected for inclusion.

Budget

$7,000 USD

Success Criterion

Successful identification of at least five high-quality repositories, completion of functional scraping scripts, and the creation of at least 100 problem-specific scenarios, algorithm solution pairs and error handling examples.

Milestone 2 - Corpus Organization

Description

This milestone involves structuring and annotating the collected examples to prepare them for integration into the corpus. The collected MeTTa code will be organized by category with detailed annotations explaining the logic and context of the code. Additionally the team will prepare MeTTa code examples by reimplementing code from other programming languages such as Haskell Prolog and Lisp to provide more variety and robustness to the corpus. This will involve carefully rewriting the selected code snippets into MeTTa while maintaining accuracy and functionality.

Deliverables

A structured and annotated collection of MeTTa code examples and reimplemented code repositories from other programming languages.

Budget

$12,000 USD

Success Criterion

Completion of at least 5000 annotated examples, with successful reimplementations of code from haskel, prolog and lisp, ensuring the examples are accurate, well-documented, and properly categorized.

Milestone 3 - Scenario and Algorithmic Problem Solutions

Description

The third phase will focus on the preparation of scenario-based and error-handling MeTTa code which will be essential for ensuring that the corpus represents real-world use cases and challenges. This includes writing solutions for algorithmic problems in MeTTa and focusing on areas such as pattern matching knowledge representation and error handling. Furthermore a script will be developed to organize the MeTTa files into a common format that aligns with the corpus standards. Using this script all previously prepared MeTTa code will be organized and standardized into a uniform format facilitating easy access and integration into the training process.

Deliverables

A set of scenario-based MeTTa code algorithmic problem solutions and a script to organize and standardize the MeTTa code.

Budget

$10,000 USD

Success Criterion

Completion of at least 100 scenario-based solutions and algorithmic problem solutions, as well as the successful development and execution of the script that organizes the corpus into the required format along with the additional 3000 data entries to the corpus.

Milestone 4 - Final Deliverables

Description

In the final phase comprehensive documentation will be written to detail the corpus creation process including the steps taken for data collection organization and annotation. The documentation will also include guidelines for extending the corpus with additional data or scenarios in the future. The corpus will be validated by test users to ensure its accuracy and usability. A final review of the deliverables will be conducted to ensure consistency and quality across all materials. Upon validation and quality checks the finalized corpus along with the documentation will be submitted.

Deliverables

A complete and finalized corpus with full documentation including validation feedback and any revisions based on test user input.

Budget

$6,000 USD

Success Criterion

Successful submission of the finalized corpus with 10000 instruction-code pair data entries

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

feedback_icon