Seyed Mohammad

Project Owner

Knowledge Graph Construction from Raw Data

Expert Rating

n/a

Type SingularityNET RFP
Funding Request $65,000 USD
RFP Guidelines Advanced knowledge graph tooling for AGI systems

Overview

We propose a framework that converts streaming raw observations into a continuously refined knowledge graph for AGI use case. Self-supervised models learn embeddings, discretised into MeTTa atoms housed in the MORK hypergraph. A symbolic engine reasons over the graph while neural encoders continuously supply fresh concepts, forming a tight neuro-symbolic loop. Ongoing refinement prunes noise, resolves contradictions, and adds causal links, giving agents a live, compact, and explainable world model.

Project Tags:
Community and Collaboration

RFP Guidelines

Advanced knowledge graph tooling for AGI systems

Complete & Awarded

Type SingularityNET RFP
Total RFP Funding $350,000 USD
Proposals 39
Awarded Projects 5

SingularityNET

Apr. 16, 2025

This RFP seeks the development of advanced tools and techniques for interfacing with, refining, and evaluating knowledge graphs that support reasoning in AGI systems. Projects may target any part of the graph lifecycle — from extraction to refinement to benchmarking — and should optionally support symbolic reasoning within the OpenCog Hyperon framework, including compatibility with the MeTTa language and MORK knowledge graph. Bids are expected to range from $10,000 - $200,000.

Proposal Description

Our Team

We are a recently graduated student team with complementary strengths in artificial intelligence and theoretical computer science. Our collaboration brings together deep expertise in applied AI/ML research and rigorous algorithmic thinking. We are driven by curiosity, technical excellence, and a shared goal of solving complex, real-world problems through interdisciplinary innovation.

Project details

We propose an open-source framework that transforms raw observations {X₁,…,Xₙ} - could be text, image, and audio streams- into a continuously evolving knowledge graph (KG) that supports causal, neuro-symbolic reasoning in the OpenCog Hyperon ecosystem. The system meets the RFP’s call for end-to-end tooling—from extraction through refinement to benchmarking—while remaining domain-agnostic and fully compatible with MeTTa and the MORK hypergraph backend.

An asynchronous ingestion layer normalises heterogenous inputs into a common event format ⟨id,time,payload,meta⟩. Modular adapters handle UTF-8 text, RGB images, and 16-kHz audio. Each adapter performs light pre-processing (tokenisation, patching, spectrogramming) and forwards the result to its modality-specific encoder. The adapters expose a uniform gRPC interface so new modalities (e.g., time-series sensors) can be snapped in without recompiling the pipeline.

Three self-supervised encoders—Transformer-based for text [1], masked-token ViT for images [2], and contrastive speech transformer for audio [3]—learn modality-specific embeddings without labels. The pre-text tasks (masked language modelling, masked patch prediction, future audio contrast) force each encoder to capture high-level semantics rather than surface patterns, delivering 768-D vectors zₖ with cosine-normalised magnitudes. A joint projection head maps all embeddings into one aligned latent space, enabling cross-modal comparison similar to CLIP-style losses [4]. This architecture is agnostic to the specific self-supervised recipe, satisfying the requirement to remain flexible to future representation advances.

A lightweight, incremental clustering module discretises incoming embeddings. New vectors are either merged into the nearest prototype if the cosine distance <τ (adaptive) or seeded as a fresh concept node otherwise. Each accepted assignment generates a MeTTa S-expression:

(concept :id 8743 :type Entity :label "violin")
(contextLink 8743 9021 :relation usedIn :confidence 0.87)

where 9021 may be a latent “classical_music” node inferred from co-occurrence statistics. Edges are hyperedges when n-ary relations are detected (e.g., (play Person Instrument Location)). Provenance, timestamps, and encoder uncertainty are stored as atom-level fields, enabling later integrity checks.

The symbolic layer (PLN + ECAN) consumes the graph to perform logical inference, similarity-guided retrieval, and planning. It can issue graph queries back to the neural side—for instance, “find an embedding most similar to the prototype for cat but closer to water_terrain” to hypothesise fishing_cat. Conversely, the neural layer consults the graph as a semantic prior, biasing its predictions toward entities already grounded with high-confidence links. This bidirectional flow realises the cognitive synergy advocated in neuro-symbolic literature [5], while maintaining clear audit trails for every symbol introduced.

A dedicated stream engine ingests atoms on commodity hardware. Every Δt seconds, a refinement job executes three passes: (i) duplicate detection and alias merging via incremental disjoint-set union; (ii) contradiction spotting using rule-based SAT templates [6]; (iii) redundancy pruning guided by a compression-coverage objective similar to PG-T [7]. Obsolete atoms decay by exponential ageing unless refreshed by new evidence. This keeps the KG compact, semantically rich, and internally consistent as mandated by the RFP’s quality goals.

Temporal edges carry lag metadata, enabling automatic extraction of candidate causal graphs. We implement an algorithm inspired by PCMCI+ [8] to propose causes links, which the symbolic layer verifies through constraint-based testing. Counterfactual queries are answered by running abduction on the causal subgraph, followed by forward simulation using learned conditional distributions. Benchmarks will include TETRAD synthetic sets and MetaQA multi-hop causal subsets, reporting AUROC and average intervention score to satisfy the “evaluate KG utility for reasoning” requirement.

The entire knowledge‐graph layer is built on MeTTa and backed by the high-performance MORK engine, with a simple API (REST/JSON-LD) for integration. It’s delivered as a modular, containerized package for straightforward deployment and maintenance. The code is organized into clear neural and symbolic components under an open-source license, thoroughly tested, and optimized to meet the RFP’s performance and scalability expectations.

We will report: extraction F₁ on Wikidata subsets, compression-coverage ratio, contradiction resolution accuracy, causal inference AUROC, and average query latency. These metrics map directly to the RFP’s emphasis on compactness, integrity, reasoning support, and performance. Public leaderboards and Jupyter notebooks will accompany each release for third-party replication.

[1] Devlin J. et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” NAACL-HLT 2019.
[2] He K. et al., “Masked Autoencoders Are Scalable Vision Learners,” CVPR 2022.
[3] Baevski A. et al., “Wav2Vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” NeurIPS 2020.
[4] Radford A. et al., “Learning Transferable Visual Models From Natural Language Supervision,” ICML 2021.
[5] Liu Z. et al., “Neuro-Symbolic Methods and Knowledge Graph Reasoning: A Survey,” AAAI 2023.
[6] Zhang J. et al., “Rule-Based Knowledge Graph Consistency Checking,” ISWC 2022.
[7] Bourhis P. et al., “Pruning Knowledge Graphs with Pattern-Graph Truncation,” WWW 2023.
[8] Runge J. et al., “Detecting Causal Associations in Large Nonlinear Time Series Datasets,” Science Advances 2022.

Background & Experience

Seyed Mohammad Seyed Javadi is a Master’s student in Computer Science at York University, specializing in theoretical computer science. He has published in top-tier conferences such as IJCAI and ICALP, and is a National Gold Medalist in the Iranian Computer Olympiad. His strengths include competitive programming, algorithm design, and formal theoretical analysis.

Amirhossein Mohammadi is a Master’s student in Artificial Intelligence at York University with over three years of focused research experience in AI and machine learning. His work spans model development, deep learning, and applied machine learning systems. Amir has hands-on experience building AI solutions and is passionate about advancing practical applications of intelligent systems.

Describe the particulars.

Crypto week in Toronto

Activity Summary

Milestones

3

Total

Discussion

0

Total Comments

Reviews

0

Total Posted

Project Team

2

Total People

Total Milestones
3
Total Budget
$65,000_USD
Last Updated
28 May 2025

Milestone 1 - Research Plan & Architecture Definition

Description

This phase establishes the foundation of the entire project by crystallizing requirements data modalities and high-level design. We will map out the end-to-end pipeline—from ingestion adapters through self-supervised encoders to graph population and reasoning loops—in a detailed architecture diagram. Risks dependencies and success metrics are identified ensuring that all stakeholders share a common understanding of scope and deliverables

Deliverables

A comprehensive research plan document (20–30 pages) will be delivered including: (1) an annotated architecture diagram showing data flows and component interactions; (2) a prioritized backlog of technical tasks with estimated effort and risk ratings; and (3) a prototype configuration for running initial data ingestion and embedding experiments. This document will be presented in a review meeting with the RFP committee and iterated based on feedback. All materials will be shared in a public GitHub repository with version control and issue tracking enabled.

Budget

$15,000 USD

Success Criterion

The milestone is successful when the review committee formally approves the research plan with no major open issues (i.e., all “critical” or “high” risks are mitigated or have clear contingency plans). The backlog must cover ≥ 90% of required technical tasks, and the architecture diagram must receive sign-off from both the neural-model and symbolic-reasoning leads. Finally, the GitHub repository must be staffed with initial issues and milestones, demonstrating that the team is ready to begin prototype development.

Milestone 2 - Prototype Implementation & Preliminary Testing

Description

In this stage we build and assemble the core pipeline components: multimodal ingestion adapters self-supervised encoders embedding-to-symbol grounding and basic MeTTa export into MORK. The prototype will process a controlled dataset (≈100 000 atoms) end-to-end and generate an initial knowledge graph with live streaming updates. We will also integrate a simple causal inference module to demonstrate generation of candidate “causes” relationships from temporal data.

Deliverables

A working codebase (Python + Rust) will be delivered in a Docker-Compose package that when launched ingests sample text image and audio streams and outputs a browsable MORK hypergraph. We will provide a Jupyter notebook illustrating data ingestion embedding visualization node creation and causal edge proposals. In addition a preliminary benchmark report will show extraction F₁ prototype clustering purity and update latency metrics on the test dataset.

Budget

$30,000 USD

Success Criterion

This milestone is deemed complete when the prototype successfully ingests ≥ 80 % of test observations without errors, produces a knowledge graph of ≥ 100 000 nodes/edges, and maintains end-to-end latency below 250 ms per observation. The preliminary benchmarks must show F₁ ≥ 0.7 for concept extraction and clustering purity ≥ 0.65. Finally, the delivered notebook and Docker package must run out-of-the-box on a standard GPU-equipped machine following documented steps.

Milestone 3 - Full-System Delivery & Benchmark Validation

Description

The final phase delivers the scalable production-ready framework capable of handling ≥ 10 million atoms with continuous streaming updates refinement passes and causal reasoning. We will implement advanced graph refinement strategies (alias merging contradiction resolution pruning) and integrate a robust causal inference engine that supports counterfactual queries. Full developer and user documentation API references and example applications will complete the system.

Deliverables

We will provide: (1) the full codebase with CI/CD pipelines; (2) Docker images and Helm charts for Kubernetes deployment; (3) a suite of automated benchmarks—covering extraction accuracy compression-coverage ratios causal inference AUROC and query latencies—run on large synthetic and real-world datasets; and (4) comprehensive documentation including a user guide API reference and tutorial workflows. All artifacts will be published under an open-source license in a public repository.

Budget

$20,000 USD

Success Criterion

The project is successful if the system processes 10 million+ atoms with streaming ingest throughput ≥ 5 000 atoms/s and average query latencies ≤ 100 ms. Benchmark results must meet or exceed: extraction F₁ ≥ 0.8, compression-coverage ≥ 0.5, and causal AUROC ≥ 0.7 on standard test suites. User acceptance is confirmed via a demo session with the RFP team, demonstrating deployment, end-to-end ingestion, and execution of representative reasoning tasks.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

No Reviews Avaliable

Check back later by refreshing the page.

The weighted average of the 4 perspectives Overall

0.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

0.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

0.0
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

0.0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

The weighted average of the 4 perspectives Overall

0.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

0.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

0.0
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

0.0

Review Headline

0 /50 chars

0 /5000 chars

Warning: Adding final group rating for this project will prevent expert users from adding new or editing existing reviews

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Desirability (secondary) Even if the project team succeeds in creating a product, there is the question of market fit. Is this a project that fulfills an actual need? Is there a lot of competition already? Are the USPs of the project sufficient to make a difference? Example:

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

Usefulness (secondary) This is a crucial category that aligns with the main goal of the Deep Funding program. The question to be asked here is: “To what extent will this proposal help to grow the Decentralized AI Platform?” For proposals that develop or utilize an AI service on the platform, the question could be “How many API calls do we expect it to generate” (and how important / high-valued are these calls?). For a marketing proposal, the question could be “How large and well-aligned is the target audience?” Another question is related to how the budget is spent. Are the funds mainly used for value creation for the platform or on other things? Examples:

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Seyed Mohammad

Project Owner

Equal Contributor

View Profile

Amirhossein Mohammadi

R&D

Equal Contributor

View Profile