
Prasad Kumkar
Project OwnerResponsible for research direction, milestone planning, symbolic task design, and integration alignment with Hyperon/MeTTa/MORK systems.
Benchmind is a benchmarking suite for symbolic and neuro-symbolic reasoning in AGI systems built on Hyperon/MeTTa. It provides a standardized set of reasoning tasks—multi-hop QA, analogical reasoning, and hypothesis generation—along with datasets, evaluation metrics, and tooling to assess the effectiveness of knowledge graphs and reasoning agents. Fully integrated with MeTTa and MORK, Benchmind enables developers to stress-test and compare symbolic, probabilistic, and hybrid LLM-based reasoning strategies at scale. It establishes a shared foundation for measuring reasoning performance, uncovering bottlenecks, and driving progress toward robust, interpretable AGI.
This RFP seeks the development of advanced tools and techniques for interfacing with, refining, and evaluating knowledge graphs that support reasoning in AGI systems. Projects may target any part of the graph lifecycle — from extraction to refinement to benchmarking — and should optionally support symbolic reasoning within the OpenCog Hyperon framework, including compatibility with the MeTTa language and MORK knowledge graph. Bids are expected to range from $10,000 - $200,000.
In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.
In this initial phase we will design the core structure and categories of the Benchmind benchmarking suite. This includes identifying AGI-relevant reasoning tasks (e.g. multi-hop QA analogical reasoning hypothesis generation) and defining benchmark formats metrics and evaluation criteria. We will analyze the cognitive capabilities needed for each task and design sample MeTTa-compatible benchmarks for each category. This milestone also includes technical planning for MORK integration and establishing baseline agents for comparative evaluation.
• A technical specification document detailing benchmark categories task types input/output formats and scoring metrics. • A prototype of 2–3 benchmark tasks for MeTTa/Hyperon agents including test datasets and expected outputs. • Documentation of evaluation protocols and examples of reasoning queries. • A project roadmap and timeline breakdown for Milestones 2 and 3.
$12,000 USD
• At least three reasoning task categories are defined and prototyped. • All sample tasks are executable using MeTTa expressions and operate on a small-scale MORK-based knowledge graph. • The evaluation logic is demonstrably functional on prototype tasks, scoring outputs for accuracy and reasoning depth. • Internal testing confirms end-to-end flow from task definition to result scoring. • Documentation is clear, complete, and ready for integration with future milestones.
This milestone focuses on the development of the core Benchmind benchmarking framework. We will build the task execution engine scoring/evaluation logic and modular interfaces for plugging in symbolic probabilistic and hybrid reasoning agents. Each task category (QA analogies hypothesis generation) will be expanded with real and synthetic datasets and configured for batch testing within the MORK environment. We will implement benchmarking support for MeTTa queries and symbolic workflows optimizing graph interaction using MORK APIs for traversal and querying. Initial performance baselines will be gathered for standard agents to validate the framework.
• A fully functional CLI-based benchmarking engine with support for running task suites and recording results. • 10+ finalized benchmark tasks across 3 categories with ground-truth answers and scoring metrics. • Integration with the MORK hypergraph backend for fast reasoning task execution. • Interfaces for symbolic agents (e.g. MeTTa scripts) and hybrid agents (LLM+KG). • Evaluation logs capturing success/failure modes reasoning paths and diagnostic metrics. • Internal baseline benchmark runs (e.g. using MeTTa-only or hardcoded traversal agents).
$24,000 USD
• The benchmarking engine runs reliably across all included task types. • Integration with MeTTa and MORK is confirmed — benchmark tasks can be loaded, queried, and scored via native interfaces. • Benchmind successfully logs execution traces and performance scores for each benchmarked agent. • Baseline agents (symbolic and/or LLM-augmented) are benchmarked on at least one full task per category. • Codebase is modular, documented, and ready for open-sourcing.
In this final phase we will complete the Benchmind suite with full optimization documentation and packaging for open-source release. This includes refining performance (query speed memory usage) stress-testing against larger graphs in MORK and ensuring stable support for hybrid agents (LLM + symbolic). We will containerize the project for easy deployment write detailed usage guides and validate the suite’s utility by benchmarking external or community agents. This milestone aims to deliver a robust extensible and production-ready benchmarking standard for symbolic AGI reasoning within the Hyperon ecosystem.
• Full Benchmind codebase tested on large-scale graphs with MORK. • Containerized deployment setup (e.g. Docker) and CLI install scripts. • Finalized benchmark task sets including datasets and evaluation configs. • Performance-tuned agent interfaces (symbolic probabilistic LLM-augmented). • Comprehensive documentation (user manual dev guide extension API). • Final benchmarking report with comparative evaluation of agents.
$24,000 USD
• Benchmind runs consistently across environments (local + containerized). • Final benchmarks execute on large MORK graphs (≥500k atoms) without failure. • Documentation enables external developers to run and extend the suite independently. • All benchmark categories show measurable reasoning outcomes across multiple agents. • Project is open-sourced, publicly accessible, and presented with a reproducible demo or walkthrough.
Reviews & Ratings
Please create account or login to write a review and rate.
Check back later by refreshing the page.
© 2025 Deep Funding
Join the Discussion (0)
Please create account or login to post comments.