Expert Rating 3.3

Anthony Oliko

Project Owner

Building a MeTTa Corpus for NL-to-Code LLMs

Expert Rating

3.3

Type SingularityNET RFP
Funding Request $30,000 USD
RFP Guidelines Create corpus for NL-to-MeTTa LLM

Overview

This proposal aims to create a high-quality MeTTa corpus designed to train or fine-tune a natural language-to-MeTTa language model (LLM). The corpus will feature up to 10,000 diverse instruction-code pairs, serving as a foundation for building an AI-powered coding assistant. Along with the corpus, we’ll deliver all necessary scripts and clear documentation to ensure transparency and reproducibility. This project is set to make MeTTa more accessible, simplify its learning curve, and contribute to AGI research within the Hyperon framework. Our team is made up of passionate members from the SingularityNET community, all deeply interested in the OpenCog Hyperon framework and its potential.

RFP Guidelines

Create corpus for NL-to-MeTTa LLM

Complete & Awarded

Type SingularityNET RFP
Total RFP Funding $70,000 USD
Proposals 10
Awarded Projects 1

SingularityNET

Aug. 13, 2024

Develop a MeTTa language corpus to enable the training or fine-tuning of an LLM and/or LoRAs aimed at supporting developers by providing a natural language coding assistant for the MeTTa language.

Proposal Description

Company Name (if applicable)

Trenches AI

Project details

This project aims to create a high-quality MeTTa corpus that can be used to train or fine-tune a natural language-to-MeTTa language model (LLM). The ultimate goal is to develop an AI-powered coding assistant that helps users generate accurate and functional MeTTa code with ease.

MeTTa is a unique, multi-paradigm language specifically designed for declarative and functional computations over knowledge metagraphs. Its innovative approach is tailored for building Artificial General Intelligence (AGI) applications. However, because MeTTa is still new and complex, it can present a steep learning curve for developers, especially those just starting out. This project aims to bridge that gap by making it easier for people to learn and use MeTTa through the help of an intelligent, AI-driven coding assistant.

Objectives:

Build a Comprehensive Corpus:
- We’ll collect and convert existing MeTTa resources, including documentation, tutorials, and community contributions, into a standardized format.
- Where gaps exist, we’ll create new examples to ensure we cover all key features and use cases of MeTTa.
- The final corpus will contain up to 10,000 diverse instruction-output pairs to provide comprehensive and practical examples for the assistant.
Ensure Quality and Usability:
- Every piece of data in the corpus will be carefully validated to ensure it’s accurate, diverse, and follows MeTTa best practices.
- We’ll also develop scripts and processes that others can use to replicate or expand the corpus in the future.
Support Open Collaboration:
- Detailed, easy-to-follow documentation will accompany the corpus to ensure anyone can understand how it was created and how it can be used.
- All code, data, and resources will be shared as open-source contributions, empowering the broader community to build on this work.

Impact:

This project is about more than just data—it’s about making MeTTa accessible to everyone. By creating an AI coding assistant powered by this corpus, we’re helping developers spend less time struggling with syntax and more time solving meaningful problems. This work will lower the barrier to entry for MeTTa, encouraging adoption and innovation within the Hyperon AGI framework.

Ultimately, this project supports the larger vision of SingularityNET, the OpenCog Foundation, and TrueAGI: to advance AGI research through decentralized, collaborative tools that empower individuals and teams alike.

Open Source Licensing

Apache License

This project will be released under the Apache License 2.0, a permissive open-source license that allows anyone to use, modify, and distribute the code and data, provided that appropriate credit is given to the original authors.

Components Outside This License

At this stage, there are no planned components or resources in the project that fall outside the scope of the Apache License 2.0. However, the project will rely on external tools and libraries (e.g., Python packages, data processing utilities) that may be governed by their respective licenses. In such cases:

All dependencies and third-party resources will be clearly documented, along with their licenses.
Care will be taken to ensure compatibility between the Apache License 2.0 and any third-party licenses.

Activity Summary

Milestones

4

Total

Discussion

0

Total Comments

Reviews

4

Total Posted

Project Team

2

Total People

Total Milestones
4
Total Budget
$30,000_USD
Last Updated
8 Dec 2024

Milestone 1 - Project Kickoff and Resource Collection

Description

The project begins with gathering and reviewing all available MeTTa resources including official documentation community contributions GitHub repositories and tutorials. This phase will also involve evaluating these resources for their quality relevance and coverage of MeTTa features. Additionally a framework for data collection and formatting will be defined along with establishing validation criteria. This phase ensures a structured foundation for building the corpus. Time: Month 1

Deliverables

1. A comprehensive list of MeTTa resources categorized by type and relevance. 2. A detailed plan outlining the methods for data extraction formatting and validation. 3. Initial drafts of scripts and tools for automated resource extraction where applicable. 4. Weekly progress updates and a milestone completion report.

Budget

$5,000 USD

Success Criterion

1. Resource Compilation: A complete and well-organized list of MeTTa resources is compiled, categorized by type (e.g., documentation, tutorials, repositories) and assessed for relevance and coverage of key MeTTa features. 2. Framework Definition: A clear and actionable plan is developed, outlining the methods for data extraction, formatting, and validation, including measurable validation criteria. 3. Tool Development: Initial versions of scripts and tools for automated resource extraction are created, tested for functionality, and meet the requirements for scalability and accuracy. 4. Progress Tracking: Weekly updates are provided, detailing completed tasks, challenges encountered, and planned actions, ensuring transparency and accountability throughout the milestone. 5. Milestone Report: A comprehensive milestone completion report is delivered, summarizing achievements, insights, and any modifications to the project plan, demonstrating readiness to proceed to the next phase.

Milestone 2 - Corpus Development and Synthesis

Description

This phase focuses on transforming raw resources into usable data. The extracted material will be formatted into instruction-output pairs with gaps addressed through the generation of synthetic examples. Coverage will include all key features and functionalities of MeTTa. The emphasis will be on creating diverse accurate and comprehensive data. Time: Month 2

Deliverables

1. A structured dataset of 5000 validated instruction-output pairs. 2. New synthetic examples to fill coverage gaps. 3. Scripts for data formatting generation and validation. 4. Midpoint evaluation report to ensure quality and alignment with project goals.

Budget

$12,000 USD

Success Criterion

1. Dataset Creation: A structured dataset of 5,000 validated instruction-output pairs is developed, demonstrating accuracy, diversity, and alignment with MeTTa’s core features and functionalities. 2. Synthetic Data Generation: New synthetic examples are created to address identified coverage gaps, ensuring comprehensive representation of MeTTa's capabilities. 3. Tool Availability: Fully functional scripts for data formatting, synthetic generation, and validation are delivered, with thorough testing to confirm usability and reliability. 4. Quality Assurance: All data passes validation checks based on predefined criteria, ensuring high standards of correctness and relevance. 5. Midpoint Evaluation: A detailed evaluation report is submitted, assessing progress, highlighting achievements, identifying potential risks, and confirming alignment with project goals.

Milestone 3 - Corpus Finalization and Quality Assurance

Description

The primary task here is to complete the corpus by expanding it to 10000 validated pairs. Rigorous quality assurance processes will be implemented to ensure that the corpus meets the defined standards for accuracy diversity and usability. Feedback from stakeholders will be incorporated during this phase. Time: Month 3

Deliverables

1. A finalized corpus with 10000 high-quality validated instruction-output pairs. 2. Comprehensive quality assurance reports detailing validation processes and outcomes. 3. Updated scripts/tools for corpus refinement and replication. 4. Weekly progress updates and milestone completion report.

Budget

$8,000 USD

Success Criterion

1. Corpus Completion: A finalized corpus of 10,000 high-quality, validated instruction-output pairs is delivered, meeting predefined standards for accuracy, diversity, and usability. 2. Quality Assurance: Comprehensive quality assurance processes are conducted, with detailed reports documenting validation criteria, methodologies, and outcomes, ensuring the dataset's reliability and robustness. 3. Tool Updates: Scripts and tools for corpus refinement and replication are updated and optimized for efficiency and scalability, with clear documentation for future use. 4. Stakeholder Feedback: Feedback from stakeholders is effectively integrated into the final corpus, addressing any concerns or suggestions to enhance its utility and relevance. 5. Progress Transparency: Weekly updates are provided, tracking milestones and ensuring alignment with project objectives, culminating in a thorough milestone completion report.

Milestone 4 - Documentation and Delivery

Description

The final phase focuses on preparing and delivering the project outputs. Comprehensive documentation will be created covering the corpus creation process validation methods known limitations and guidelines for future use. All scripts tools and data will be organized tested and delivered as open-source resources. Time: Month 4

Deliverables

1. Full documentation detailing the project including data sources methods and use instructions. 2. All scripts and tools necessary for replicating or extending the corpus creation process. 3. Finalized corpus shared as an open-source deliverable. 4. Presentation to stakeholders summarizing project outcomes. 5. Final project report summarizing milestones challenges and future recommendations.

Budget

$5,000 USD

Success Criterion

1. Comprehensive Documentation: Detailed, user-friendly documentation is delivered, clearly describing data sources, corpus creation methods, validation processes, known limitations, and guidelines for replication or extension. 2. Script and Tool Delivery: All scripts and tools required for reproducing or extending the corpus creation process are finalized, thoroughly tested, and organized, ensuring they are functional and accessible as open-source resources. 3. Corpus Accessibility: The finalized corpus is published and made available as an open-source deliverable, meeting all requirements for usability and proper licensing. 4. Stakeholder Presentation: A well-structured presentation is conducted, effectively summarizing project goals, achievements, challenges, and the practical utility of deliverables. 5. Final Project Report: A comprehensive final report is submitted, detailing milestones achieved, obstacles encountered, solutions implemented, and recommendations for future work.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

3.3

Compliance with RFP requirements 4.3
Solution details and team expertise 3.3
Value for money 3.0

Expert Review 1
Overall

4.0
- Compliance with RFP requirements 5.0
- Solution details and team expertise 3.0
- Value for money 0.0
Clear proposal

Clear and structured plan with strong deliverables and adherence to RFP goals. Unclear team background and potentially unrealistic timelines raise concerns.

Expert Review 2
Overall

3.0
- Compliance with RFP requirements 4.0
- Solution details and team expertise 3.0
- Value for money 0.0
It's a solid proposal but the tricky part (synthetic generation) is just briefly alluded to...

Not clear if the team has the expertise to make synthetic generation work here, simply fine-tuning models on the available metta code seems not to work ...

Expert Review 3
Overall

3.0
- Compliance with RFP requirements 4.0
- Solution details and team expertise 3.0
- Value for money 0.0
Vague -- what processes does the proposer intend to use? Lack of relevant detail.

The weighted average of the 4 perspectives Overall

0.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

0.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

0.0
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

0.0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

The weighted average of the 4 perspectives Overall

3.3
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

3.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

4.3
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

3.3

Review Headline

0 /50 chars

0 /5000 chars

Warning: Adding final group rating for this project will prevent expert users from adding new or editing existing reviews

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Desirability (secondary) Even if the project team succeeds in creating a product, there is the question of market fit. Is this a project that fulfills an actual need? Is there a lot of competition already? Are the USPs of the project sufficient to make a difference? Example:

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

Usefulness (secondary) This is a crucial category that aligns with the main goal of the Deep Funding program. The question to be asked here is: “To what extent will this proposal help to grow the Decentralized AI Platform?” For proposals that develop or utilize an AI service on the platform, the question could be “How many API calls do we expect it to generate” (and how important / high-valued are these calls?). For a marketing proposal, the question could be “How large and well-aligned is the target audience?” Another question is related to how the budget is spent. Are the funds mainly used for value creation for the platform or on other things? Examples:

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Anthony Oliko

Project Owner

Oversee the project and ensure alignment with objectives and timelines. Design and implement scripts/tools. Manage the generation of synthetic data to supplement existing resources. Documentation

View Profile

Account Pending

R & D

Account Pending