Benchmind

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
user-profile-img
Prasad Kumkar
Project Owner

Benchmind

Expert Rating

n/a

Overview

Benchmind is a benchmarking suite for symbolic and neuro-symbolic reasoning in AGI systems built on Hyperon/MeTTa. It provides a standardized set of reasoning tasks—multi-hop QA, analogical reasoning, and hypothesis generation—along with datasets, evaluation metrics, and tooling to assess the effectiveness of knowledge graphs and reasoning agents. Fully integrated with MeTTa and MORK, Benchmind enables developers to stress-test and compare symbolic, probabilistic, and hybrid LLM-based reasoning strategies at scale. It establishes a shared foundation for measuring reasoning performance, uncovering bottlenecks, and driving progress toward robust, interpretable AGI.

RFP Guidelines

Advanced knowledge graph tooling for AGI systems

Internal Proposal Review
  • Type SingularityNET RFP
  • Total RFP Funding $350,000 USD
  • Proposals 40
  • Awarded Projects n/a
author-img
SingularityNET
Apr. 16, 2025

This RFP seeks the development of advanced tools and techniques for interfacing with, refining, and evaluating knowledge graphs that support reasoning in AGI systems. Projects may target any part of the graph lifecycle — from extraction to refinement to benchmarking — and should optionally support symbolic reasoning within the OpenCog Hyperon framework, including compatibility with the MeTTa language and MORK knowledge graph. Bids are expected to range from $10,000 - $200,000.

Proposal Description

Proposal Details Locked…

In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    3

  • Total Budget

    $60,000 USD

  • Last Updated

    27 May 2025

Milestone 1 - Benchmark Design & Prototyping

Description

In this initial phase we will design the core structure and categories of the Benchmind benchmarking suite. This includes identifying AGI-relevant reasoning tasks (e.g. multi-hop QA analogical reasoning hypothesis generation) and defining benchmark formats metrics and evaluation criteria. We will analyze the cognitive capabilities needed for each task and design sample MeTTa-compatible benchmarks for each category. This milestone also includes technical planning for MORK integration and establishing baseline agents for comparative evaluation.

Deliverables

• A technical specification document detailing benchmark categories task types input/output formats and scoring metrics. • A prototype of 2–3 benchmark tasks for MeTTa/Hyperon agents including test datasets and expected outputs. • Documentation of evaluation protocols and examples of reasoning queries. • A project roadmap and timeline breakdown for Milestones 2 and 3.

Budget

$12,000 USD

Success Criterion

• At least three reasoning task categories are defined and prototyped. • All sample tasks are executable using MeTTa expressions and operate on a small-scale MORK-based knowledge graph. • The evaluation logic is demonstrably functional on prototype tasks, scoring outputs for accuracy and reasoning depth. • Internal testing confirms end-to-end flow from task definition to result scoring. • Documentation is clear, complete, and ready for integration with future milestones.

Milestone 2 - Core Suite Implementation & Integration

Description

This milestone focuses on the development of the core Benchmind benchmarking framework. We will build the task execution engine scoring/evaluation logic and modular interfaces for plugging in symbolic probabilistic and hybrid reasoning agents. Each task category (QA analogies hypothesis generation) will be expanded with real and synthetic datasets and configured for batch testing within the MORK environment. We will implement benchmarking support for MeTTa queries and symbolic workflows optimizing graph interaction using MORK APIs for traversal and querying. Initial performance baselines will be gathered for standard agents to validate the framework.

Deliverables

• A fully functional CLI-based benchmarking engine with support for running task suites and recording results. • 10+ finalized benchmark tasks across 3 categories with ground-truth answers and scoring metrics. • Integration with the MORK hypergraph backend for fast reasoning task execution. • Interfaces for symbolic agents (e.g. MeTTa scripts) and hybrid agents (LLM+KG). • Evaluation logs capturing success/failure modes reasoning paths and diagnostic metrics. • Internal baseline benchmark runs (e.g. using MeTTa-only or hardcoded traversal agents).

Budget

$24,000 USD

Success Criterion

• The benchmarking engine runs reliably across all included task types. • Integration with MeTTa and MORK is confirmed — benchmark tasks can be loaded, queried, and scored via native interfaces. • Benchmind successfully logs execution traces and performance scores for each benchmarked agent. • Baseline agents (symbolic and/or LLM-augmented) are benchmarked on at least one full task per category. • Codebase is modular, documented, and ready for open-sourcing.

Milestone 3 - Finalization Optimization & Public Release

Description

In this final phase we will complete the Benchmind suite with full optimization documentation and packaging for open-source release. This includes refining performance (query speed memory usage) stress-testing against larger graphs in MORK and ensuring stable support for hybrid agents (LLM + symbolic). We will containerize the project for easy deployment write detailed usage guides and validate the suite’s utility by benchmarking external or community agents. This milestone aims to deliver a robust extensible and production-ready benchmarking standard for symbolic AGI reasoning within the Hyperon ecosystem.

Deliverables

• Full Benchmind codebase tested on large-scale graphs with MORK. • Containerized deployment setup (e.g. Docker) and CLI install scripts. • Finalized benchmark task sets including datasets and evaluation configs. • Performance-tuned agent interfaces (symbolic probabilistic LLM-augmented). • Comprehensive documentation (user manual dev guide extension API). • Final benchmarking report with comparative evaluation of agents.

Budget

$24,000 USD

Success Criterion

• Benchmind runs consistently across environments (local + containerized). • Final benchmarks execute on large MORK graphs (≥500k atoms) without failure. • Documentation enables external developers to run and extend the suite independently. • All benchmark categories show measurable reasoning outcomes across multiple agents. • Project is open-sourced, publicly accessible, and presented with a reproducible demo or walkthrough.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

Welcome to our website!

Nice to meet you! If you have any question about our services, feel free to contact us.