Refinement, Entity/Rels Extraction + Benchmarking

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
user-profile-img
ivan reznikov
Project Owner

Refinement, Entity/Rels Extraction + Benchmarking

Expert Rating

n/a

Overview

Our team is passionate about tackling one of the most critical challenges in AGI development: creating robust, reliable knowledge graphs that can truly support advanced reasoning. We're proposing a comprehensive set of tools and techniques to refine, extract, and benchmark knowledge graphs designed for AGI systems. What excites us most is the potential impact on scalability, noise reduction, and evaluation in neuro-symbolic AI frameworks - with particular attention to compatibility with OpenCog Hyperon, including MeTTa and MORK.

RFP Guidelines

Advanced knowledge graph tooling for AGI systems

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $350,000 USD
  • Proposals 39
  • Awarded Projects 5
author-img
SingularityNET
Apr. 16, 2025

This RFP seeks the development of advanced tools and techniques for interfacing with, refining, and evaluating knowledge graphs that support reasoning in AGI systems. Projects may target any part of the graph lifecycle — from extraction to refinement to benchmarking — and should optionally support symbolic reasoning within the OpenCog Hyperon framework, including compatibility with the MeTTa language and MORK knowledge graph. Bids are expected to range from $10,000 - $200,000.

Proposal Description

Our Team

We're a passionate group of experts who have worked together on complex graph problems for years:

  • 2 dedicated researchers with backgrounds in knowledge representation

  • 2 backend and 2 data engineers with extensive graph database experience

  • 2 data scientists who specialize in graph analytics

  • 1 graph scientist focused on optimization algorithms

  • 1.5 graph network expert

Company Name (if applicable)

SciRenovation Labs

Project details

One of the largest struggle we found working with continuosly evolving graph systems is that across time they become more and more "noised". Some artefacts that where important or extracted in the first graphs are kept, despite being absolete, as the graph continues to be updated with adding new nodes. And as we're talking about the elephant in the room - duplicated nodes can occur while updating or merging graphs. And as we've ran our studies - they ruin reasoning capabilities almost more than anything.

The frustration of dealing with noise, inconsistency, and inefficiency driven us to develop this proposal. Our vision is to create open, extensible, and truly scalable tools that address the full lifecycle of knowledge graphs - making them more reliable and powerful for neuro-symbolic AI systems.

We're focusing on three key areas that we believe will make the most significant impact:

1. Scalable Knowledge Graph Refinement Process

We're determined to overcome the limitations of traditional filtering approaches like TF-IDF and spectral sparsification. Our approach leverages cutting-edge techniques to intelligently identify and remove redundant, low-value, or noisy nodes - ensuring both scalability and high value.

We're particularly excited about:

  • Implementing knowledge distillation techniques that preserve essential information while dramatically reducing graph size
  • Developing anomaly detection methods that can identify inconsistencies even in complex, interconnected knowledge structures

  • Using LLMs as sophisticated noise detectors through different methods show promising results in preliminary tests. But unfortunately they aren’t very reliable and scalable
  • etc

We're also exploring TransGAT embedding models to learn vector representations that help us spot anomalies and inconsistencies that traditional methods miss. The potential of positive contrast learning approaches has us enthusiastic about enhancing knowledge perception while reducing noise.

For larger knowledge graphs with substantial noise, we're adapting Tri-embed Noise Detection (TrE-ND) to leverage multiple embedding models simultaneously - early results suggest this approach is both more accurate and cost-efficient than previous methods.

Additionally, we're investigating homophily-enhanced structure learning techniques to refine graph structure by adding missing links between similar entities while removing spurious connections.

2. Advanced Entity & Relationship Extraction for Knowledge Graphs

We're frustrated by the current limitations of domain-specific entity extraction methods. Our goal is to create a truly domain-agnostic AI model capable of advanced entity and relationship extraction that can transform flat graph structures into rich ontological structures with arrows and weights.

We're exploring:

  • Cutting-edge Named Entity Recognition (NER) techniques beyond traditional approaches

  • Semantic relationship classification that captures nuanced connections between entities

  • Graph Attention Networks (GATs) and domain-agnostic graph embeddings to enhance extraction quality

One of our most ambitious goals is to assign meaningful probabilities and confidence scores to nodes and relationships - using Bayesian networks and probabilistic soft logic - so that reasoning systems can intelligently weigh evidence and handle ambiguity. We're also developing low-dimensional embeddings (TransE, RotatE, GraphSAGE) to support vector-space similarity queries and nearest-neighbor reasoning.

3. Benchmarks for Retrieval and Reasoning on Knowledge Graphs

We're tired of inadequate evaluation metrics that don't capture real-world performance. Our team is designing a comprehensive benchmark suite that evaluates the aspects that truly matter: speed, quality, and interpretability of graph-based retrieval for reasoning tasks relevant to LLMs and AGI systems.

Our benchmarks will measure:

  • Query latency across different graph sizes and complexities

  • Answer quality compared to ground truth and human expectations

  • Path interpretability to ensure reasoning is transparent and trustworthy

  • Graph coverage across multiple domains

  • Robustness to noise and incomplete information

Beyond simple factual queries, we're focusing on complex reasoning tasks like multi-hop question answering, analogical reasoning, and hypothesis generation - the kinds of tasks that distinguish truly intelligent systems from mere information retrievers.

We're particularly inspired by the Abstraction and Reasoning Corpus (ARC) benchmark and are developing similar rigorous evaluations that test human-like reasoning rather than simple pattern matching.

Our project naturally aligns with the Hyperon framework by providing essential tools that will enhance symbolic inference and reasoning. We're committed to exploring compatibility with the MORK system to boost real-time reasoning and retrieval capabilities.

  1. Zhang, X., & Sheng, V. S. (2024). Bridging the Gap: Representation Spaces in Neuro-Symbolic AI. ACM Computing Surveys, 37(4), 1–35.  

  2. Hossain, D., & Chen, J. Y. (2025). A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives. Preprint.  

  3. Hofer, Marvin, Daniel Obraczka, Alieh Saeedi, Hanna Köpcke, and Erhard Rahm. "Construction of knowledge graphs: State and challenges." arXiv preprint arXiv:2302.11509 (2023).

  4. Dong, Na, Natthawut Kertkeidkachorn, Xin Liu, and Kiyoaki Shirai. "Refining Noisy Knowledge Graph with Large Language Models." In Proceedings of the Workshop on Generative AI and Knowledge Graphs (GenAIK), pp. 78-86. 2025.

  5. Liu, J., Zheng, T., Zhang, G., & Hao, Q. (2023). Graph-based Knowledge Distillation: A survey and experimental evaluation. arXiv preprint arXiv:2302.14643.  

  6. Paulheim, H. (2017). Knowledge graph refinement. Synthesis Lectures on the Semantic Web: Theory and Technology, 7(1), 1-134.  

  7. Zhu, Hongyin. "Node classification via semantic-structural attention-enhanced graph convolutional networks." arXiv preprint arXiv:2403.16033 (2024).

  8. Zhang, Xikun, Dongjin Song, Yixin Chen, and Dacheng Tao. "Topology-aware embedding memory for continual learning on expanding networks." In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4326-4337. 2024.

  9. Tan, Zhen, Alimohammad Beigi, Song Wang, Ruocheng Guo, Amrita Bhattacharjee, Bohan Jiang, Mansooreh Karami, Jundong Li, Lu Cheng, and Huan Liu. "Large language models for data annotation: A survey." arXiv e-prints (2024): arXiv-2402.

  10. Wang, Jiahui, Kun Yue, and Liang Duan. "Models and techniques for domain relation extraction: a survey." Journal of Data Science and Intelligent Systems 1, no. 2 (2023): 65-82.

  11. Zhang, Qinggang, Junnan Dong, Hao Chen, Daochen Zha, Zailiang Yu, and Xiao Huang. "Knowgpt: Knowledge graph based prompting for large language models." Advances in Neural Information Processing Systems 37 (2024): 6052-6080.

  12. Zhao, Qi, Hongyu Yang, Qi Song, Xinwei Yao, and Xiangyang Li. "KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs." arXiv preprint arXiv:2502.12029 (2025).

  13. Hou, Kun, Jingyuan Li, Yingying Liu, Shiqi Sun, Haoliang Zhang, and Haiyang Jiang. "KG-EGV: A Framework for Question Answering with Integrated Knowledge Graphs and Large Language Models." Electronics 13, no. 23 (2024): 4835..  

  14. Luo, Y., Zhang, Y., & Wang, L. (2023). Reasoning on Graphs: Faithful and Interpretable Reasoning with Large Language Models. arXiv preprint arXiv:2310.01061.  

  15. Shen, Tiesunlong, Jin Wang, Xuejie Zhang, and Erik Cambria. "Reasoning with Trees: Faithful Question Answering over Knowledge Graph." In Proceedings of the 31st International Conference on Computational Linguistics, pp. 3138-3157. 2025.  

  16. Goertzel, Ben, Vitaly Bogdanov, Michael Duncan, Deborah Duong, Zarathustra Goertzel, Jan Horlings, Matthew Ikle et al. "Opencog hyperon: A framework for agi at the human level and beyond." arXiv preprint arXiv:2310.18318 (2023).

  17. Zhang, Ying, Zhiqiang Zhao, and Zhuo Feng. "A unified approach to scalable spectral sparsification of directed graphs." arXiv preprint arXiv:1812.04165 (2018).

  18. Zhuo, Jiaming, Yintong Lu, Hui Ning, Kun Fu, Dongxiao He, Chuan Wang, Yuanfang Guo, Zhen Wang, Xiaochun Cao, and Liang Yang. "Unified Graph Augmentations for Generalized Contrastive Learning on Graphs." Advances in Neural Information Processing Systems 37 (2024): 37473-37503.

  19. Lin, Yiqing, Jianheng Tang, Chenyi Zi, H. Vicky Zhao, Yuan Yao, and Jia Li. "UniGAD: Unifying Multi-level Graph Anomaly Detection." arXiv preprint arXiv:2411.06427 (2024).

  20. Ma, Tengfei, Yujie Chen, Wen Tao, Dashun Zheng, Xuan Lin, Cheong-Iao Pang, Yiping Liu et al. "Learning to denoise biomedical knowledge graph for robust molecular interaction prediction." IEEE Transactions on Knowledge and Data Engineering (2024).

  21. Yang, Zhisheng, and Li Li. "Knowledge graph-based recommendation with knowledge noise reduction and data augmentation." Applied Intelligence 54, no. 21 (2024): 10333-10359.

  22. Sun, Jiaqi, Yujia Zheng, Xinshuai Dong, Haoyue Dai, and Kun Zhang. "Type Information-Assisted Self-Supervised Knowledge Graph Denoising." arXiv preprint arXiv:2503.09916 (2025).

  23. Grant, John, and V. S. Subrahmanian. "Reasoning in inconsistent knowledge bases." IEEE Transactions on Knowledge and Data Engineering 7, no. 1 (1995): 177-189.

  24. Nentidis, Anastasios, Charilaos Akasiadis, Angelos Charalambidis, and Alexander Artikis. "Dealing with Inconsistency for Reasoning over Knowledge Graphs: A Survey." arXiv preprint arXiv:2502.19023 (2025).

Open Source Licensing

MIT - Massachusetts Institute of Technology License

Background & Experience

This proposal is one of five parts of a unified toolkit to be developed in parallel across a team of 14-16 developers and scientists. We have already a case of successful grant completion for DeepFunding.

Our team includes current employees from Yandex and Intel, former Linkedin, 3 university lecturers, and 7 PhDs. We're proud of our 10+ presidential awards, several patents and 30+ publications, including multiple technical books from a well known publishing house.

If we are awarded funding for 4 out of 5 proposals, we are committed to developing the 5th one at no additional cost.

We believe it’s better to have one toolkit than a kit of tools, that need to be duct-taped together.

We aim to deliver a robust, extensible, modular, production-ready ecosystem that can evolve with future RFPs, enabling seamless adoption, innovation, and collaboration. This approach will maximize the utility of knowledge graph fundamentals and pull in other innovative features and technologies from DeepFunding.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    4

  • Total Budget

    $87,500 USD

  • Last Updated

    18 May 2025

Milestone 1 - Noise Reduction Framework Development

Description

The first milestone will deliver a functional prototype of our noise reduction framework for knowledge graphs. This framework will implement our novel approach that leverages LLM-based noise detection combined with knowledge distillation techniques to identify and remove redundant or low-value nodes. The prototype will demonstrate significant improvements in graph quality while maintaining essential information integrity across multiple test datasets.

Deliverables

- A working prototype implementation of the noise reduction framework with documented API - Technical report detailing the architecture algorithms and comparative analysis against TF-IDF spectral sparsification approaches and other - Demonstration of the framework on at least two distinct knowledge graph datasets with varying characteristics

Budget

$30,000 USD

Success Criterion

The framework will achieve 10-15+% reduction in graph size while maintaining 95% or higher accuracy on standard knowledge graph completion tasks compared to the original graph. Performance metrics will be documented across multiple domains to demonstrate domain-agnosticity.

Milestone 2 - Entity and Relationship Extraction System

Description

This milestone focuses on developing a domain-agnostic entity and relationship extraction system capable of forming hierarchies from flat lists and graphs or transforming flat textual data into rich ontological structures. The system will incorporate advanced Named Entity Recognition semantic relationship classification and other techniques to identify complex relationships between entities. We will develop methods to assign confidence scores to extracted relationships enabling probabilistic reasoning over the knowledge graph.

Deliverables

- A deployable entity and relationship extraction system with comprehensive documentation - Evaluation report comparing our system against state-of-the-art extraction methods

Budget

$25,000 USD

Success Criterion

The system will demonstrate the ability to extract entities and relationships across at least several distinct domains with F1 scores exceeding current state-of-the-art methods. The system will successfully assign meaningful confidence scores that correlate with human judgment.

Milestone 3 - Benchmark Suite for Knowledge Graph Evaluation

Description

We will develop a comprehensive benchmark suite for evaluating knowledge graph quality specifically designed for retrieval and reasoning tasks. This suite will measure multiple dimensions including query latency answer quality path interpretability and robustness to noise. The benchmark will include complex reasoning tasks that go beyond simple factual queries to assess reasoning capabilities more relevant to AGI systems.

Deliverables

- Open-source benchmark suite with documentation and easy setup process - Dataset collection spanning multiple domains for standardized evaluation - Draft of a methodology paper describing benchmark design principles and evaluation metrics

Budget

$22,500 USD

Success Criterion

The benchmark will successfully differentiate performance between various knowledge graph approaches on complex reasoning tasks with statistical significance.

Milestone 4 - Integration with metta/MORK

Description

This milestone will deliver integration capabilities between our knowledge graph technologies and MeTTa and MORK compatible with the Hyperon framework.

Deliverables

- Integration libraries for connecting our knowledge graph tools with MeTTa and Hyperon - Example applications demonstrating end-to-end reasoning capabilities - Performance analysis comparing reasoning tasks before and after knowledge graph refinement

Budget

$10,000 USD

Success Criterion

The integration will enable faster query processing on complex reasoning tasks compared to unrefined knowledge graphs. We will demonstrate successful multi-hop reasoning on at least three use cases that were previously intractable due to knowledge graph noise or inefficiency.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

Welcome to our website!

Nice to meet you! If you have any question about our services, feel free to contact us.