Large MeTTa corpus for LLM fine-tuning

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Seb Wiechers
Project Owner

Large MeTTa corpus for LLM fine-tuning

Expert Rating

n/a

Overview

We propose to curate two (natural language <-> MeTTa) expression datasets, respectively the *silver* dataset, consisting of 20.000 AI-generated, probabilistically verified (NL <-> MeTTa) pairs, and the *gold* dataset, consisting of 10.000 human-labeled, high-quality pairs. The proposed timeline is 4 months. Funding will be used to cover the expense of a) compute costs b) scoring output-pairs, c) developing algorithms for estimating the probability of correct predictions. Our knowledge of the MeTTa language and NLP, linguistics, background in logic, AI and real-world organizational experience places us in a perfect position to have a compounding effect on the SNET ecosystem.

RFP Guidelines

Create corpus for NL-to-MeTTa LLM

Proposal Submission (4 days left)
  • Type SingularityNET RFP
  • Total RFP Funding $70,000 USD
  • Proposals 4
  • Awarded Projects n/a
author-img
SingularityNET
Aug. 13, 2024

Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.

Proposal Description

Proposal Details Locked…

In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    4

  • Total Budget

    $27,000 USD

  • Last Updated

    28 Nov 2024

Milestone 1 - Python data pipeline

Description

1 month: gathering of team, resources and tooling. Defining human labeling workflow. Selecting appropriate datasets and preparing the data pipeline in python (deliverable)

Deliverables

We will open source a data pipeline that can be used to generate and curate NL <> MeTTa pairs.

Budget

$4,000 USD

Milestone 2 - 'Common Sense' verification algorithm

Description

2 months: small-scale testing of different generative approaches and development of a 'common sense' algorithm that checks parsed metta expressions against a logical properties of input statements (deliverable)

Deliverables

We will open source an algorithm and approach that can be used to perform a 'common sense' test on generated MeTTa statements, if the logical relation between input expressions is known beforehand.

Budget

$6,000 USD

Milestone 3 - Halfway milestone

Description

3 months: start human labeling-process, while incrementally using findings to produce the silver dataset. 

Deliverables

By this time we expect to be able to deliver 2.000 gold pairs, as well as 20.000 silver pairs.

Budget

$8,500 USD

Milestone 4 - Project delivery (10.000 gold pairs)

Description

4 months: delivery of the silver dataset (minimum 20.000 labeled pairs) and a gold dataset (10.000 pairs)

Deliverables

We finish the project, delivering the remaining 8.000 gold pairs.

Budget

$8,500 USD

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

feedback_icon