Prompt-Instruct-Response MeTTa CORPUS

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Expert Rating 4.3
Remo Start
Project Owner

Prompt-Instruct-Response MeTTa CORPUS

Expert Rating

4.3

Overview

The methodology utilized in creating a corpus for an LLM is as important as the quality of the dataset, we propose to create a corpus for MeTTa using the prompt-instruct-response approach to allow the model to learn both the theoretical and practical concepts and aspects of MeTTa programming language.The prompt-instruct-response approach is a technique in the LLM dataset that allows the LLM to learn not only from the response but the instructions and prompt. The implication of this is that any coding assistance trained on such a corpus better understands contexts around each line in the dataset. Where context, efficiency and practicality is of the essence this methodology performs best.

RFP Guidelines

Create corpus for NL-to-MeTTa LLM

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $70,000 USD
  • Proposals 10
  • Awarded Projects 1
author-img
SingularityNET
Aug. 13, 2024

Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.

Proposal Description

Our Team

Our team have a combined experience in AI/ML, data engineering and Software development of over 20 years. We are the team that built the Corpus for Plutus LLM a repo we used for building an LLM that assists developers on how to code and debug with Plutus

Company Name (if applicable)

Remostart

Open Source Licensing

MIT - Massachusetts Institute of Technology License

The use of the MIT License for this project ensures that the MeTTa corpus and accompanying scripts are open-source and freely available to the public. Under the MIT License:

  1. Permissive Licensing: Users are granted the right to use, modify, distribute, and integrate the corpus and scripts into their projects, including commercial applications, with minimal restrictions.

  2. Attribution Requirement: Users must include appropriate credit to the original creators of the corpus and scripts, ensuring acknowledgment of the work.

  3. No Liability or Warranty: The license explicitly disclaims any warranty or liability, protecting the creators from legal responsibility for issues arising from the use of the corpus or scripts.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    3

  • Total Budget

    $35,000 USD

  • Last Updated

    8 Dec 2024

Milestone 1 - Project Setup and Initial Design

Description

Establish the foundational framework for the project by finalizing the methodology, identifying data sources, and developing initial scripts for data extraction and formatting

Deliverables

Project plan and timeline. List of identified MeTTa resources (e.g., tutorials, GitHub repositories, documentation) to be used as data sources. Initial data extraction and formatting scripts.

Budget

$7,000 USD

Success Criterion

Project plan and timeline approved by the team. At least three MeTTa resources identified and documented. At least ten sample resource of extracted and formated data.

Milestone 2 - Corpus Development and Validation

Description

Build the core of the dataset by creating the first 7,000 prompt-instruct-response pairs from existing MeTTa resources and validating their correctness

Deliverables

A corpus containing 7,000 validated prompt-instruct-response pairs. Validation report ensuring correctness of at least 95% of the dataset. Initial documentation describing data sources, methods for extraction, and validation process.

Budget

$21,000 USD

Success Criterion

Dataset contains 7,000 prompt-instruct-response pairs in the defined JSON structure. Validation confirmation ≥ 95% of outputs are error-free and adhere to MeTTa standards. Initial documentation is reviewed and finalized.

Milestone 3 - Completion, Refinement, and Documentation

Description

Finalize the dataset by synthesizing new data to complete 10,000 pairs, validate the entire corpus, and produce comprehensive documentation for release

Deliverables

Complete 10,000-pair MeTTa corpus, including synthesized data for underrepresented features. Final validation report ensuring corpus quality and diversity. Fully functional scripts for data generation and validation, released under the MIT License. Comprehensive final documentation, comprehensive final josn file and an optional CSV version of the corpus.

Budget

$7,000 USD

Success Criterion

Corpus contains exactly 10,000 entries, validated with ≥ 95% correctness. Scripts and dataset are uploaded to a version-controlled repository. Final documentation reviewed and accessible, final json file submitted and verified with an optional CSV file generated and verified(optional if we have time within the 4 months then csv will be added to the jsonl file)

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

4.3

  • Feasibility 4.3
  • Desirabilty 4.3
  • Usefulness 4.0

Experts highly favored this proposal but ultimately selected another due to lack of conviction that the team possessed strong command of the MeTTa language.

  • Expert Review 1

    Overall

    5.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 5.0
    • Value for money 5.0
    Top proposal for this RFP

    Credible methodology with a focus on high-quality, validated MeTTa examples. Strong alignment with RFP goals, backed by a competent team and open-source principles. Overall, a promising and well-structured proposal with minor execution concerns.

  • Expert Review 2

    Overall

    5.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 5.0
    • Value for money 4.0
    It's a credible proposal by a team that has done something similar already for Plutus

    While MeTTa will give some challenges that Plutus didn't due to the oddity of the language and the small corpus size, the team's experience will give them a head start...

  • Expert Review 3

    Overall

    3.0

    • Compliance with RFP requirements 3.0
    • Solution details and team expertise 3.0
    • Value for money 3.0

    The proposal is to use prompt engineering in the form of prompt-instruct-response. There is little additional discussion of their approach.

feedback_icon