SingularityNET

RFP Owner

Create corpus for NL-to-MeTTa LLM

Create MeTTa corpus that can be used to train an LLM for English => MeTTa

Type SingularityNET RFP
Total RFP Funding $70,000 USD
Proposals 0
Awarded Projects 1

Overview

Est. Execution Time
⏱️ 4 Months
Proposal Winners
🏆 Multiple
Max Funding / Proposal
$35,000 USD

Short summary

Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.

Main purpose

The purpose of this RFP is to create a MeTTa corpus that can be used to train or fine-tune a natural language-to-MeTTa LLM. This corpus will support the development of an AI-powered coding assistant that helps users generate accurate and functional MeTTa code, thereby lowering the barrier to entry for MeTTa and accelerating AGI development within the Hyperon framework.

Long description

Context and background:
SingularityNET Foundation, in collaboration with other partners such as the OpenCog Foundation and TrueAGI, is working toward a scalable implementation of the Hyperon AGI framework running on decentralized infrastructure, and toward the implementation of the PRIMUS cognitive architecture within this framework.

Hyperon and PRIMUS are complex systems involving multiple components, which need to demonstrate appropriate functionalities both individually and in combination. This RFP aims to address a portion of this overall need by funding the initial iteration of a significant component of PRIMUS within Hyperon: the development of a comprehensive corpus for NL-to-MeTTa language model training.

The purpose of this corpus is to enable the creation of an AI-powered MeTTa coding assistant, which will assist users in generating correct and functional MeTTa code. This coding assistant will play a crucial role in lowering the barrier to entry for MeTTa, thereby accelerating the broader AGI development within the Hyperon framework.

MeTTa is a multi-paradigm language for declarative and functional computations over knowledge metagraphs, designed specifically to meet the needs of Artificial General Intelligence (AGI). It is an innovative and relatively new language, and might sometimes come with a learning curve for starters. While there are also plenty of resources and tutorials that one can reference for his/her needs, they may not be enough to fully cover all possible unique usages. Therefore it will be beneficial to have a coding assistant for MeTTa that can instantly help people to create what they want with MeTTa.

There are materials like the official documentation, tutorials, and repositories on GitHub that contain MeTTa programs, which include those that are created and written by our community members. All these are valuable resources that can be used for the purpose of creating such a MeTTa coding assistant, but they are scattered around and often not in a ready-to-use format, besides the volume of these resources in total may not be enough to even fine-tune an AI model given how new the MeTTa language is at the moment. Therefore the purpose of this RFP is to address this need by:

Identifying and converting from the existing MeTTa resources to a format that is ready-to-use for the purpose of creating a MeTTa coding assistant
Generate/synthesize new data in the same format for the same purpose
Corpus should be up to 10,000 pairs, being sufficiently diverse and consistent with best practices

Collaboration

This RFP will be followed by subsequent RFPs that make use of the MeTTa corpus to train or fine-tune an AI model as a coding assistant for MeTTa.

RFP expected outcomes

MeTTa corpus
- A MeTTa corpus that can be used to train or fine-tune an AI model as a MeTTa coding assistant
OSS code
- All the code that was used to create the MeTTa corpus that others can also run and replicate the same corpus creation process
Thorough documentation
- Provide comprehensive documentation detailing how the MeTTa corpus is gathered and/or generated/synthesized

Functional Requirements

Must have

MeTTa corpus containing instruction-output pairs where instructions are in natural language, and outputs are in valid, error-free MeTTa code.
When issues arise, or additional details are beneficial or needed, (e.g. code errors, explanations of how specific MeTTa code works, retrieval of useful functions for the code assistant), such instances should be recorded and well-documented.
Comprehensive coverage of all features and functionalities that MeTTa offers.
Thorough documentation detailing the process of corpus creation, including data sources, methods for generation and validation, and known limitations.

Should have

Code/scripts used for creating the MeTTa corpus, covering processes such as data extraction, conversion, generation, synthesis, evaluation, and validation.
Additional documentation that aids future users in training or fine-tuning their own MeTTa coding assistants, potentially including tutorials or examples.
Corpus should be up to 10,000 pairs, being sufficiently diverse and consistent with best practices

Could have

Roadmap for future updates to the corpus that encompass continuous development and new features within the MeTTa language.
Video explaining the corpus

Won't have

Integration with AI systems outside the intended scope of a MeTTa coding assistant.

Non-functional Requirements

Must have

The MeTTa corpus must be well-structured and organized to facilitate easy use in training or fine-tuning an LLM.
The final deliverable should include detailed, version-controlled documentation covering steps of the corpus creation process, ensuring transparency and reproducibility.
MeTTa scripts submitted as a part of the corpus should adhere to established coding standards and best practices.

Should have

Documentation should be accessible and understandable by a broad audience, potentially including non-experts who may wish to use or expand upon the work.

Could have

Suggestions for continuing and optimizing the corpus.

Main evaluation criteria

Alignment with requirements and objective

Does the proposal meet the requirements and advances the objectives of the RFP.

Pre-existing R&D

Has the team previously done similar or related research or development work in other platforms / languages / contexts?

Team competence

Does the team have relevant skills?

Cost

Does the proposal offer good value for money?

Timeline

Does the proposal include a set of clearly defined milestones?

Other resources

Hyperon and related AI-platforms are quickly evolving! This is a bit of a moving target, but the internal SingularityNET team will be available for help and expert advice, where needed. Also included:

MeTTa language website
SingularityNET technology links
Educational materials and resources for learning MeTTa
SingularityNET holds MeTTa study group calls every other week. Proposers are welcome to attend for support from our researchers and community.
Recurring Hyperon study group calls for community are currently being planned. These will cover MOSES, ECAN, PLN, and other key components of the OpenCog and PRIMUS Hyperon cognitive architectures.
Access to the SingularityNET World Mattermost server, with a dedicated channel for discussion and support among the RFP-winning teams and SingularityNET resources.

RFP Status

Completed & Awarded

The community and public are invited to view the full proposals and give feedback. During this time the RFP committee will doing their formal selection process to award winning proposals.

View Awarded Projects

0 proposals

No Proposals Avaliable

Check back later by refreshing the page.

1 Projects

Large MeTTa corpus for LLM fine-tuning

Type SingularityNET RFP
Funding Awarded n/a
RFP Guidelines Create corpus for NL-to-MeTTa LLM

Seb Wiechers

Nov. 28, 2024

Join the Discussion (0)

khellar

RFP Owner & Editor

View Profile

PROJECTS

Rounds

RFPS