Prasad Kumkar

Project Owner

Benchmind

Expert Rating

n/a

Type SingularityNET RFP
Funding Request n/a
RFP Guidelines Advanced knowledge graph tooling for AGI systems

Details locked. Check back later to view.
Details locked.Check back later to view.
Details locked.Check back later to view.

Overview

Benchmind is a benchmarking suite for symbolic and neuro-symbolic reasoning in AGI systems built on Hyperon/MeTTa. It provides a standardized set of reasoning tasks—multi-hop QA, analogical reasoning, and hypothesis generation—along with datasets, evaluation metrics, and tooling to assess the effectiveness of knowledge graphs and reasoning agents. Fully integrated with MeTTa and MORK, Benchmind enables developers to stress-test and compare symbolic, probabilistic, and hybrid LLM-based reasoning strategies at scale. It establishes a shared foundation for measuring reasoning performance, uncovering bottlenecks, and driving progress toward robust, interpretable AGI.

Project Tags:
Community and Collaboration

RFP Guidelines

Advanced knowledge graph tooling for AGI systems

Internal Proposal Review

Type SingularityNET RFP
Total RFP Funding $350,000 USD
Proposals 39
Awarded Projects n/a

SingularityNET

Apr. 16, 2025

This RFP seeks the development of advanced tools and techniques for interfacing with, refining, and evaluating knowledge graphs that support reasoning in AGI systems. Projects may target any part of the graph lifecycle — from extraction to refinement to benchmarking — and should optionally support symbolic reasoning within the OpenCog Hyperon framework, including compatibility with the MeTTa language and MORK knowledge graph. Bids are expected to range from $10,000 - $200,000.

Proposal Description

Proposal Details Locked…

In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.

Total Milestones
3
Total Budget
$60,000_USD
Last Updated
27 May 2025

Milestone 1 - Benchmark Design & Prototyping

Description

In this initial phase we will design the core structure and categories of the Benchmind benchmarking suite. This includes identifying AGI-relevant reasoning tasks (e.g. multi-hop QA analogical reasoning hypothesis generation) and defining benchmark formats metrics and evaluation criteria. We will analyze the cognitive capabilities needed for each task and design sample MeTTa-compatible benchmarks for each category. This milestone also includes technical planning for MORK integration and establishing baseline agents for comparative evaluation.

Deliverables

• A technical specification document detailing benchmark categories task types input/output formats and scoring metrics. • A prototype of 2–3 benchmark tasks for MeTTa/Hyperon agents including test datasets and expected outputs. • Documentation of evaluation protocols and examples of reasoning queries. • A project roadmap and timeline breakdown for Milestones 2 and 3.

Budget

$12,000 USD

Success Criterion

• At least three reasoning task categories are defined and prototyped. • All sample tasks are executable using MeTTa expressions and operate on a small-scale MORK-based knowledge graph. • The evaluation logic is demonstrably functional on prototype tasks, scoring outputs for accuracy and reasoning depth. • Internal testing confirms end-to-end flow from task definition to result scoring. • Documentation is clear, complete, and ready for integration with future milestones.

Milestone 2 - Core Suite Implementation & Integration

Description

This milestone focuses on the development of the core Benchmind benchmarking framework. We will build the task execution engine scoring/evaluation logic and modular interfaces for plugging in symbolic probabilistic and hybrid reasoning agents. Each task category (QA analogies hypothesis generation) will be expanded with real and synthetic datasets and configured for batch testing within the MORK environment. We will implement benchmarking support for MeTTa queries and symbolic workflows optimizing graph interaction using MORK APIs for traversal and querying. Initial performance baselines will be gathered for standard agents to validate the framework.

Deliverables

• A fully functional CLI-based benchmarking engine with support for running task suites and recording results. • 10+ finalized benchmark tasks across 3 categories with ground-truth answers and scoring metrics. • Integration with the MORK hypergraph backend for fast reasoning task execution. • Interfaces for symbolic agents (e.g. MeTTa scripts) and hybrid agents (LLM+KG). • Evaluation logs capturing success/failure modes reasoning paths and diagnostic metrics. • Internal baseline benchmark runs (e.g. using MeTTa-only or hardcoded traversal agents).

Budget

$24,000 USD

Success Criterion

• The benchmarking engine runs reliably across all included task types. • Integration with MeTTa and MORK is confirmed — benchmark tasks can be loaded, queried, and scored via native interfaces. • Benchmind successfully logs execution traces and performance scores for each benchmarked agent. • Baseline agents (symbolic and/or LLM-augmented) are benchmarked on at least one full task per category. • Codebase is modular, documented, and ready for open-sourcing.

Milestone 3 - Finalization Optimization & Public Release

Description

In this final phase we will complete the Benchmind suite with full optimization documentation and packaging for open-source release. This includes refining performance (query speed memory usage) stress-testing against larger graphs in MORK and ensuring stable support for hybrid agents (LLM + symbolic). We will containerize the project for easy deployment write detailed usage guides and validate the suite’s utility by benchmarking external or community agents. This milestone aims to deliver a robust extensible and production-ready benchmarking standard for symbolic AGI reasoning within the Hyperon ecosystem.

Deliverables

• Full Benchmind codebase tested on large-scale graphs with MORK. • Containerized deployment setup (e.g. Docker) and CLI install scripts. • Finalized benchmark task sets including datasets and evaluation configs. • Performance-tuned agent interfaces (symbolic probabilistic LLM-augmented). • Comprehensive documentation (user manual dev guide extension API). • Final benchmarking report with comparative evaluation of agents.

Budget

$24,000 USD

Success Criterion

• Benchmind runs consistently across environments (local + containerized). • Final benchmarks execute on large MORK graphs (≥500k atoms) without failure. • Documentation enables external developers to run and extend the suite independently. • All benchmark categories show measurable reasoning outcomes across multiple agents. • Project is open-sourced, publicly accessible, and presented with a reproducible demo or walkthrough.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

No Reviews Avaliable

Check back later by refreshing the page.

The weighted average of the 4 perspectives Overall

0.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

0.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

0.0
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

0.0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

The weighted average of the 4 perspectives Overall

0.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

0.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

0.0
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

0.0

Review Headline

0 /50 chars

0 /5000 chars

Warning: Adding final group rating for this project will prevent expert users from adding new or editing existing reviews

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Desirability (secondary) Even if the project team succeeds in creating a product, there is the question of market fit. Is this a project that fulfills an actual need? Is there a lot of competition already? Are the USPs of the project sufficient to make a difference? Example:

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

Usefulness (secondary) This is a crucial category that aligns with the main goal of the Deep Funding program. The question to be asked here is: “To what extent will this proposal help to grow the Decentralized AI Platform?” For proposals that develop or utilize an AI service on the platform, the question could be “How many API calls do we expect it to generate” (and how important / high-valued are these calls?). For a marketing proposal, the question could be “How large and well-aligned is the target audience?” Another question is related to how the budget is spent. Are the funds mainly used for value creation for the platform or on other things? Examples:

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Prasad Kumkar

Project Owner

Responsible for research direction, milestone planning, symbolic task design, and integration alignment with Hyperon/MeTTa/MORK systems.

View Profile

Account Pending

Engineering Lead

Oversees system architecture, benchmark engine implementation, and optimization of graph traversal and reasoning agent integration using MORK.

Account Pending

Senior SDE

Benchmark Core Developer – Leads the development of task execution engines, evaluation pipelines, and parallel data handling logic for scaling across large graph datasets.

Account Pending

Full Stack Dev

Systems – Develops task definition loaders, API bridges between reasoning agents and scoring engines, and supports testing across symbolic and hybrid task types.

Account Pending

Benchmind

Prasad Kumkar

Benchmind

Expert Rating

Overview

RFP Guidelines

Advanced knowledge graph tooling for AGI systems

SingularityNET

Proposal Description

Proposal Details Locked…

Proposal Video

Not Avaliable Yet

3

$60,000 USD

27 May 2025

Milestone 1 - Benchmark Design & Prototyping

Description

Deliverables

Budget

Success Criterion

Milestone 2 - Core Suite Implementation & Integration

Description

Deliverables

Budget

Success Criterion

Milestone 3 - Finalization Optimization & Public Release

Description

Deliverables

Budget

Success Criterion

Join the Discussion (0)

Expert Ratings

No Reviews Avaliable

Prasad Kumkar

Account Pending

Account Pending

Account Pending

Receive notifications on Deep Funding

Product

Program

Resources

SingularityNET

Welcome to our website!

$60,000_USD