Building a MeTTa Corpus for NL-to-Code LLMs

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Anthony Oliko
Project Owner

Building a MeTTa Corpus for NL-to-Code LLMs

Expert Rating

n/a

Overview

This proposal is to create a high-quality MeTTa corpus for training or fine-tuning a natural language-to-MeTTa language model (LLM). The corpus will consist of up to 10,000 diverse instruction-code pairs, aiding in the development of an AI-powered coding assistant. Deliverables include the corpus, scripts, and documentation, ensuring transparency and reproducibility. This project will accelerate adoption of MeTTa, reduce its learning curve, and support AGI research in the Hyperon framework.

RFP Guidelines

Create corpus for NL-to-MeTTa LLM

Proposal Submission (4 days left)
  • Type SingularityNET RFP
  • Total RFP Funding $70,000 USD
  • Proposals 4
  • Awarded Projects n/a
author-img
SingularityNET
Aug. 13, 2024

Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.

Proposal Description

Proposal Details Locked…

In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    4

  • Total Budget

    $30,000 USD

  • Last Updated

    21 Nov 2024

Milestone 1 - Project Kickoff and Resource Collection

Description

The project begins with gathering and reviewing all available MeTTa resources, including official documentation, community contributions, GitHub repositories, and tutorials. This phase will also involve evaluating these resources for their quality, relevance, and coverage of MeTTa features. Additionally, a framework for data collection and formatting will be defined, along with establishing validation criteria. This phase ensures a structured foundation for building the corpus. Time: Month 1

Deliverables

1. A comprehensive list of MeTTa resources categorized by type and relevance. 2. A detailed plan outlining the methods for data extraction, formatting, and validation. 3. Initial drafts of scripts and tools for automated resource extraction where applicable. 4. Weekly progress updates and a milestone completion report.

Budget

$5,000 USD

Milestone 2 - Corpus Development and Synthesis

Description

This phase focuses on transforming raw resources into usable data. The extracted material will be formatted into instruction-output pairs, with gaps addressed through the generation of synthetic examples. Coverage will include all key features and functionalities of MeTTa. The emphasis will be on creating diverse, accurate, and comprehensive data. Time: Month 2

Deliverables

1. A structured dataset of 5,000 validated instruction-output pairs. 2. New synthetic examples to fill coverage gaps. 3. Scripts for data formatting, generation, and validation. 4. Midpoint evaluation report to ensure quality and alignment with project goals.

Budget

$12,000 USD

Milestone 3 - Corpus Finalization and Quality Assurance

Description

The primary task here is to complete the corpus by expanding it to 10,000 validated pairs. Rigorous quality assurance processes will be implemented to ensure that the corpus meets the defined standards for accuracy, diversity, and usability. Feedback from stakeholders will be incorporated during this phase. Time: Month 3

Deliverables

1. A finalized corpus with 10,000 high-quality, validated instruction-output pairs. 2. Comprehensive quality assurance reports detailing validation processes and outcomes. 3. Updated scripts/tools for corpus refinement and replication. 4. Weekly progress updates and milestone completion report.

Budget

$8,000 USD

Milestone 4 - Documentation and Delivery

Description

The final phase focuses on preparing and delivering the project outputs. Comprehensive documentation will be created, covering the corpus creation process, validation methods, known limitations, and guidelines for future use. All scripts, tools, and data will be organized, tested, and delivered as open-source resources. Time: Month 4

Deliverables

1. Full documentation detailing the project, including data sources, methods, and use instructions. 2. All scripts and tools necessary for replicating or extending the corpus creation process. 3. Finalized corpus shared as an open-source deliverable. 4. Presentation to stakeholders summarizing project outcomes. 5. Final project report summarizing milestones, challenges, and future recommendations.

Budget

$5,000 USD

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

feedback_icon