Seb Wiechers

Project Owner

Large MeTTa corpus for LLM fine-tuning

Type SingularityNET RFP
Funding Awarded $27,000 USD
RFP Guidelines Create corpus for NL-to-MeTTa LLM

Status

Overall Status
⏳ Contract Pending
Funding Transfered
$4,000_USD
Max Funding Amount
$27,000_USD

Project Tags:
Algorithmic/technical

Funding Schedule

View Milestones

Milestone Release 1	$4,000 USD	Transfer Complete	TBD
Milestone Release 2	$6,000 USD	Pending	TBD
Milestone Release 3	$8,500 USD	Pending	TBD
Milestone Release 4	$8,500 USD	Pending	TBD

Overview

We propose to curate two (natural language <-> MeTTa) expression datasets, respectively the *silver* dataset, consisting of 20.000 AI-generated, probabilistically verified (NL <-> MeTTa) pairs, and the *gold* dataset, consisting of 10.000 human-labeled, high-quality pairs. The proposed timeline is 4 months. Funding will be used to cover the expense of a) compute costs b) scoring output-pairs, c) developing algorithms for estimating the probability of correct predictions. Our knowledge of the MeTTa language and NLP, linguistics, background in logic, AI and real-world organizational experience places us in a perfect position to have a compounding effect on the SNET ecosystem.

RFP Guidelines

Complete & Awarded

Create corpus for NL-to-MeTTa LLM

Ended on:

8 Dec. 2024

Days

Hours

Minutes

Type SingularityNET RFP
Total RFP Funding $70,000 USD
Proposals 0
Awarded Projects 1

SingularityNET

Aug. 13, 2024

View Awarded Projects

Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.

Proposal Description

Company Name (if applicable)

Pearstop

Project details

We propose to curate two (natural language <-> MeTTa) expression datasets, respectively:

the silver dataset, consisting of 20.000 AI-generated, probabilistically verified (NL <-> MeTTa) pairs,
and the gold dataset, consisting of 10.000 human-verified, high-quality pairs. The proposed timeline is 4 months.

Funding will be used to cover the expense of

a) compute costs
b) scoring output-pairs
c) developing algorithms for estimating the probability of correct predictions.
d) human capital
e) tooling

Our knowledge of the MeTTa language and NLP, linguistics, background in logic, AI and real-world organizational experience places us in a perfect position to have a compounding effect on the SNET ecosystem.

Milestones

1 month: gathering of team, resources and tooling. Defining human labeling workflow. Selecting appropriate datasets and preparing the data pipeline in python (deliverable)
2 months: small-scale testing of different generative approaches and development of a 'common sense' algorithm that checks parsed metta expressions against a logical properties of input statements (deliverable)
3 months: start human labeling-process, while incrementally using findings to produce the silver dataset.
4 months: delivery of the silver dataset (minimum 20.000 labeled pairs) and a gold dataset (10.000 pairs)

We will deliver not only the correct NL <> Metta pairs, but also incorrect labels that we accumulated in the process. This will be useful for defining a loss function in potential DNN approaches in future projects.

Open Source Licensing

MIT - Massachusetts Institute of Technology License

Default MIT Licence (code and dataset)

Activity Summary

Milestones

4

Total

Discussion

0

Total Comments

Reviews

4

Total Posted

Project Team

1

Total People

Group Expert Rating (Final)

Overall

5.0

Compliance with RFP requirements 5.0
Solution details and team expertise 5.0
Value for money 4.0

New reviews and ratings are disabled for Awarded Projects

Overall Community

4.3

from 4 reviews

5

1
4

2
3

0
2

0
1

0

Feasibility

5

from 4 reviews

Viability

5

from 4 reviews

Desirabilty

4

from 4 reviews

Usefulness

0

from 4 reviews

Sort by

4 ratings

Expert Review 1
Overall

4.0
- Compliance with RFP requirements 5.0
- Solution details and team expertise 4.0
- Value for money 0.0
Robust solution

Robust, cost-effective proposal offering dual datasets and reusable tools. Clear milestones and deliverables. Risks include reliance on AI-generated data, ambitious gold dataset timeline, and unsubstantiated team expertise.

Expert Review 2
Overall

4.0
- Compliance with RFP requirements 5.0
- Solution details and team expertise 4.0
- Value for money 0.0
It's a sensible proposal though light on some critical details

This proposal understands the size and nature of the task, and takes seriously the magnitude of the human labeling process ... a little more concreteness on how the human labeling and the synthesis will be done would have been better but at least the proposers are taking the nature of the task seriously...

Expert Review 3
Overall

5.0
- Compliance with RFP requirements 5.0
- Solution details and team expertise 4.0
- Value for money 0.0
A great idea using a mix of human-curation and ML tools with two datasets (Silver and Gold). I was really hoping for more detail of the processes proposed for the ML group.

Posting publicaly as

edit profile

Overall

Confidence level that the project is possible at all Feasibility

0
Confidence in a successful outcome considering team, time, and budget Viability

0
Market fit - Balancing needs and benefits against competition Desirabilty

0
To what extent will the project help the AI platform grow? Usefulness

0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

Expert Review

Overall

Confidence level that the project is possible at all Feasibility

0
Confidence in a successful outcome considering team, time, and budget Viability

0
Market fit - Balancing needs and benefits against competition Desirabilty

0
To what extent will the project help the AI platform grow? Usefulness

0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user's assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Desirability (secondary) Even if the project team succeeds in creating a product, there is the question of market fit. Is this a project that fulfills an actual need? Is there a lot of competition already? Are the USPs of the project sufficient to make a difference? Example:

Creating a translation service from, say Spanish to English might be possible, but it's questionable if such a service would be able to get a significant share of the market

Usefulness (secondary) This is a crucial category that aligns with the main goal of the Deep Funding program. The question to be asked here is: “To what extent will this proposal help to grow the Decentralized AI Platform?” For proposals that develop or utilize an AI service on the platform, the question could be “How many API calls do we expect it to generate” (and how important / high-valued are these calls?). For a marketing proposal, the question could be “How large and well-aligned is the target audience?” Another question is related to how the budget is spent. Are the funds mainly used for value creation for the platform or on other things? Examples:

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Total Milestones
4
Total Budget
$27,000_USD
Last Updated
10 Sep 2025

Milestone 1 - Python data pipeline

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

5.0

Compliance with RFP requirements 5.0
Solution details and team expertise 5.0
Value for money 4.0

New reviews and ratings are disabled for Awarded Projects

Expert Review 1
Overall

4.0
- Compliance with RFP requirements 5.0
- Solution details and team expertise 4.0
- Value for money 0.0
Robust solution

Robust, cost-effective proposal offering dual datasets and reusable tools. Clear milestones and deliverables. Risks include reliance on AI-generated data, ambitious gold dataset timeline, and unsubstantiated team expertise.

Expert Review 2
Overall

4.0
- Compliance with RFP requirements 5.0
- Solution details and team expertise 4.0
- Value for money 0.0
It's a sensible proposal though light on some critical details

This proposal understands the size and nature of the task, and takes seriously the magnitude of the human labeling process ... a little more concreteness on how the human labeling and the synthesis will be done would have been better but at least the proposers are taking the nature of the task seriously...

Expert Review 3
Overall

5.0
- Compliance with RFP requirements 5.0
- Solution details and team expertise 4.0
- Value for money 0.0
A great idea using a mix of human-curation and ML tools with two datasets (Silver and Gold). I was really hoping for more detail of the processes proposed for the ML group.

The weighted average of the 4 perspectives Overall

0.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

0.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

0.0
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

0.0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

The weighted average of the 4 perspectives Overall

5.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

4.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

5.0
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

5.0

Review Headline

0 /50 chars

0 /5000 chars

Warning: Adding final group rating for this project will prevent expert users from adding new or editing existing reviews

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Seb Wiechers

Project Owner

Project owner. Responsible for curating the team, defining workflows, designing algorithms, requirements. Also: upskilling human labeling team to working knowledge of MeTTa language.

View Profile