Knowledge Graph Construction from Raw Data

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
user-profile-img
Seyed Mohammad
Project Owner

Knowledge Graph Construction from Raw Data

Expert Rating

n/a

Overview

We propose a framework that converts streaming raw observations into a continuously refined knowledge graph for AGI use case. Self-supervised models learn embeddings, discretised into MeTTa atoms housed in the MORK hypergraph. A symbolic engine reasons over the graph while neural encoders continuously supply fresh concepts, forming a tight neuro-symbolic loop. Ongoing refinement prunes noise, resolves contradictions, and adds causal links, giving agents a live, compact, and explainable world model.

RFP Guidelines

Advanced knowledge graph tooling for AGI systems

Internal Proposal Review
  • Type SingularityNET RFP
  • Total RFP Funding $350,000 USD
  • Proposals 40
  • Awarded Projects n/a
author-img
SingularityNET
Apr. 16, 2025

This RFP seeks the development of advanced tools and techniques for interfacing with, refining, and evaluating knowledge graphs that support reasoning in AGI systems. Projects may target any part of the graph lifecycle — from extraction to refinement to benchmarking — and should optionally support symbolic reasoning within the OpenCog Hyperon framework, including compatibility with the MeTTa language and MORK knowledge graph. Bids are expected to range from $10,000 - $200,000.

Proposal Description

Proposal Details Locked…

In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    3

  • Total Budget

    $65,000 USD

  • Last Updated

    28 May 2025

Milestone 1 - Research Plan & Architecture Definition

Description

This phase establishes the foundation of the entire project by crystallizing requirements data modalities and high-level design. We will map out the end-to-end pipeline—from ingestion adapters through self-supervised encoders to graph population and reasoning loops—in a detailed architecture diagram. Risks dependencies and success metrics are identified ensuring that all stakeholders share a common understanding of scope and deliverables

Deliverables

A comprehensive research plan document (20–30 pages) will be delivered including: (1) an annotated architecture diagram showing data flows and component interactions; (2) a prioritized backlog of technical tasks with estimated effort and risk ratings; and (3) a prototype configuration for running initial data ingestion and embedding experiments. This document will be presented in a review meeting with the RFP committee and iterated based on feedback. All materials will be shared in a public GitHub repository with version control and issue tracking enabled.

Budget

$15,000 USD

Success Criterion

The milestone is successful when the review committee formally approves the research plan with no major open issues (i.e., all “critical” or “high” risks are mitigated or have clear contingency plans). The backlog must cover ≥ 90% of required technical tasks, and the architecture diagram must receive sign-off from both the neural-model and symbolic-reasoning leads. Finally, the GitHub repository must be staffed with initial issues and milestones, demonstrating that the team is ready to begin prototype development.

Milestone 2 - Prototype Implementation & Preliminary Testing

Description

In this stage we build and assemble the core pipeline components: multimodal ingestion adapters self-supervised encoders embedding-to-symbol grounding and basic MeTTa export into MORK. The prototype will process a controlled dataset (≈100 000 atoms) end-to-end and generate an initial knowledge graph with live streaming updates. We will also integrate a simple causal inference module to demonstrate generation of candidate “causes” relationships from temporal data.

Deliverables

A working codebase (Python + Rust) will be delivered in a Docker-Compose package that when launched ingests sample text image and audio streams and outputs a browsable MORK hypergraph. We will provide a Jupyter notebook illustrating data ingestion embedding visualization node creation and causal edge proposals. In addition a preliminary benchmark report will show extraction F₁ prototype clustering purity and update latency metrics on the test dataset.

Budget

$30,000 USD

Success Criterion

This milestone is deemed complete when the prototype successfully ingests ≥ 80 % of test observations without errors, produces a knowledge graph of ≥ 100 000 nodes/edges, and maintains end-to-end latency below 250 ms per observation. The preliminary benchmarks must show F₁ ≥ 0.7 for concept extraction and clustering purity ≥ 0.65. Finally, the delivered notebook and Docker package must run out-of-the-box on a standard GPU-equipped machine following documented steps.

Milestone 3 - Full-System Delivery & Benchmark Validation

Description

The final phase delivers the scalable production-ready framework capable of handling ≥ 10 million atoms with continuous streaming updates refinement passes and causal reasoning. We will implement advanced graph refinement strategies (alias merging contradiction resolution pruning) and integrate a robust causal inference engine that supports counterfactual queries. Full developer and user documentation API references and example applications will complete the system.

Deliverables

We will provide: (1) the full codebase with CI/CD pipelines; (2) Docker images and Helm charts for Kubernetes deployment; (3) a suite of automated benchmarks—covering extraction accuracy compression-coverage ratios causal inference AUROC and query latencies—run on large synthetic and real-world datasets; and (4) comprehensive documentation including a user guide API reference and tutorial workflows. All artifacts will be published under an open-source license in a public repository.

Budget

$20,000 USD

Success Criterion

The project is successful if the system processes 10 million+ atoms with streaming ingest throughput ≥ 5 000 atoms/s and average query latencies ≤ 100 ms. Benchmark results must meet or exceed: extraction F₁ ≥ 0.8, compression-coverage ≥ 0.5, and causal AUROC ≥ 0.7 on standard test suites. User acceptance is confirmed via a demo session with the RFP team, demonstrating deployment, end-to-end ingestion, and execution of representative reasoning tasks.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

Welcome to our website!

Nice to meet you! If you have any question about our services, feel free to contact us.