Reproducible KG Generation with Ontology Learning

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
user-profile-img
ivan reznikov
Project Owner

Reproducible KG Generation with Ontology Learning

Expert Rating

n/a

Overview

Our team is committed to solving one of the most significant challenges in knowledge representation for AGI: ensuring reproducibility in knowledge graphs generated from unstructured text data. We're proposing a comprehensive suite of open-source tools and methodologies to enforce consistency, reduce variability, and automate schema learning for LLM-generated knowledge graphs. What drives our enthusiasm is the potential to dramatically improve trust, reliability, and interoperability in neuro-symbolic AI systems - with particular attention to integration with OpenCog Hyperon, MeTTa, and the MORK backend for robust knowledge representation.

RFP Guidelines

Advanced knowledge graph tooling for AGI systems

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $350,000 USD
  • Proposals 39
  • Awarded Projects 5
author-img
SingularityNET
Apr. 16, 2025

This RFP seeks the development of advanced tools and techniques for interfacing with, refining, and evaluating knowledge graphs that support reasoning in AGI systems. Projects may target any part of the graph lifecycle — from extraction to refinement to benchmarking — and should optionally support symbolic reasoning within the OpenCog Hyperon framework, including compatibility with the MeTTa language and MORK knowledge graph. Bids are expected to range from $10,000 - $200,000.

Proposal Description

Our Team

We're a passionate group of experts who have worked together on complex graph problems for years:

  • 2 dedicated researchers with backgrounds in knowledge representation

  • 2 backend and 1 data engineers with extensive graph database experience

  • 1 data scientists who specialize in graph analytics

  • 2 natural language specialist and 1 generative AI developer
  • 1 graph scientist focused on optimization algorithms

Company Name (if applicable)

SciRenovation Labs

Project details

Anyone working with LLM-based knowledge graph generation has encountered the frustration of inconsistent outputs, hallucinations, and structural variability. We've experienced these challenges firsthand, and they've motivated us to develop this proposal. Our vision is to create open, standardized, and truly reliable tools that address the entire reproducibility lifecycle of LLM-generated knowledge graphs - making them consistent, verifiable, and suitable for mission-critical AGI reasoning tasks.

We're focusing on three key areas that we believe will deliver the most significant impact:

1. Schema-Guided Knowledge Graph Generation Framework

We're determined to overcome the inherent variability in LLM outputs when extracting knowledge from text. Our approach leverages structured guidance through schemas and ontologies to enforce consistency and predictability in the generated graphs.

We're particularly excited about:

  • Developing intelligent schema extraction techniques that automatically learn domain structures from text corpora

  • Implementing pattern-based extraction frameworks that constrain LLMs to produce consistent entity-relationship structures

  • Creating template-based prompting strategies specifically designed to reduce hallucinations and ensure semantic correctness

  • Building reproducible pipelines that integrate schema validation at each extraction stage

We're also exploring the use of different guiding mechanisms based on domain complexity. For well-understood domains, we'll leverage existing ontologies, while for emerging domains, we're developing semi-automated approaches that combine LLM intelligence with human verification for schema creation.

For complex knowledge domains requiring fine-grained control, we're adapting ontology-grounded extraction approaches that have shown promising results in preliminary tests - early evaluations suggest this method improves triple extraction accuracy by over 40% compared to unconstrained approaches.

Additionally, we're investigating prompt engineering techniques specifically designed for knowledge extraction tasks, with special attention to reproducibility across different runs and LLM configurations.

2. Open-Source Schema Learning and Ontology Population Tools

We're frustrated by the current limitations of manual schema creation, which is time-consuming and requires deep domain expertise. Our goal is to create truly automated, open-source tools for schema learning and ontology population that can transform unstructured text into consistent knowledge structures.

We're exploring:

  • Novel NLP approaches for automated schema induction from domain-specific text

  • Hybrid methods that leverage both statistical analysis and LLM capabilities for ontology learning

  • Validation frameworks that ensure the semantic correctness of automatically generated schemas

  • Integration with existing open-source ontology management tools like Protégé and Apache Jena

One of our most ambitious goals is to develop a unified workflow for schema learning that combines the strengths of traditional NLP tools (like spaCy and NLTK) with the semantic understanding capabilities of LLMs - creating a pipeline that's both robust and adaptable across domains.

We're also developing specialized prompt engineering techniques that enable LLMs to generate competency questions, extract relations, and suggest properties for ontology development - with built-in validation mechanisms to ensure consistency and accuracy.

3. Reproducibility Validation Framework and Benchmarks

We're concerned about the lack of standardized methods for evaluating reproducibility in knowledge graph generation. Our team is designing a comprehensive validation framework and benchmark suite that objectively measures consistency, accuracy, and structural fidelity across multiple runs of the generation process.

Our validation framework will measure:

  • Triple consistency across different generation runs with identical inputs

  • Structural similarity using graph isomorphism and embedding-based comparison techniques

  • Semantic equivalence through query-based evaluation and reasoning tasks

  • Resistance to variability from different LLM parameter settings and prompt variations

  • Consistency between extracted knowledge and source text

Beyond simple structural comparisons, we're focusing on semantic evaluations that assess whether the knowledge graphs capture the same meaning even when surface representations differ. We're particularly inspired by recent advances in knowledge graph validation techniques and are developing similar rigorous evaluations.

Our project naturally aligns with the Hyperon framework by providing essential reproducibility tools that will enhance the reliability of knowledge representations. We're committed to exploring integration with the MORK system to ensure consistent knowledge graph generation and validation within the OpenCog ecosystem.

References

 

  1. Kommineni, V. K., König-Ries, B., & Samuel, S. (2024). From human experts to machines: An LLM supported approach to ontology and knowledge graph construction. arXiv preprint arXiv:2403.08345.

  2. Brack, A., Hoppe, T., Auer, S., Schmachtenberg, M., Tuballa, R., & Lehmann, J. (2020). The open research knowledge graph: Towards a national infrastructure for research information. In Proceedings of the 19th International Semantic Web Conference (ISWC 2020) - Posters & Demonstrations Track (pp. 1-4).

  3. Nechakhin, V., Singh, N., & Jentzsch, A. (2024). Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph. In Proceedings of the 18th International Conference on Semantic Systems (SEMANTiCS) (pp. 1-8).

  4. Stardog. (2024). Enterprise AI Requires the Fusion of LLM and Knowledge Graph.

  5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

  6. Feng, X., Wu, X., & Meng, H. (2024). Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema. arXiv preprint arXiv:2412.20942.

  7. Wang, Y., Zhang, M., & Wang, S. (2020). Efficient Knowledge Graph Validation via Cross-Graph Representation Learning. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 1933-1942).

  8. Boylan, J., Mangla, S., Thorn, D., Gholipour Ghalandari, D., Ghaffari, P., & Hokamp, C. (2024). KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction. arXiv preprint arXiv:2404.15923.

  9. Mayerhofer, N. (2024). Constructing Knowledge Graphs From Unstructured Text Using LLMs. Neo4j Blog.

  10. Kommineni, V. K., König-Ries, B., & Samuel, S. (2024). Towards the Automation of Knowledge Graph Construction Using Large Language Models. In Proceedings of the 3rd International Workshop on Natural Language Processing for Knowledge Graph Creation (part of 20th International Conference on Semantic Systems (SEMANTiCS 2024)) (Vol. 3874, pp. 19-34). CEUR-WS.

  11. Estuary. (n.d.). 15 Best Open Source Data Analytics Tools.

  12. Haas, R. (2024). kgw Documentation.

  13. Uschold, M., & Gruninger, M. (1996). Ontologies: Principles, methods, and applications. The Knowledge Engineering Review, 11(2), 93-136.

  14. Noy, N. F., & McGuinness, D. L. (2001). Ontology development 101: A guide to creating your first ontology. Stanford knowledge systems laboratory technical report KSL-01-05 and Stanford medical informatics technical report SMI-2001-0926.

  15. Horridge, Matthew, Rafael S. Gonçalves, Csongor I. Nyulas, Tania Tudorache, and Mark A. Musen. "WebProtégé 3.0-Collaborative OWL Ontology Engineering in the Cloud." In ISWC (P&D/Industry/BlueSky). 2018.

  16. Deepfunding.ai. (n.d.). Scalable MeTTa Knowledge Graphs.

  17. Broscheit, S., Ruffinelli, D., Kochsiek, A., Betz, P., & Gemulla, R. (2020). LibKGE: A knowledge graph embedding library for reproducible research. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 165-174).

  18. Hogg, D. (2024). llmgraph. GitHub repository.

Open Source Licensing

MIT - Massachusetts Institute of Technology License

Background & Experience

This proposal is one of five parts of a unified toolkit to be developed in parallel across a team of 14-16 developers and scientists. We have already a case of successful grant completion for DeepFunding.

Our team includes current employees from Yandex and Intel, former Linkedin, 3 university lecturers, and 7 PhDs. We're proud of our 10+ presidential awards, several patents and 30+ publications, including multiple technical books from a well known publishing house.

If we are awarded funding for 4 out of 5 proposals, we are committed to developing the 5th one at no additional cost.

We believe it’s better to have one toolkit than a kit of tools, that need to be duct-taped together.

We aim to deliver a robust, extensible, modular, production-ready ecosystem that can evolve with future RFPs, enabling seamless adoption, innovation, and collaboration. This approach will maximize the utility of knowledge graph fundamentals and pull in other innovative features and technologies from DeepFunding.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    4

  • Total Budget

    $87,500 USD

  • Last Updated

    17 May 2025

Milestone 1 - Schema-Guided Knowledge Graph Generation Framework

Description

We will develop a comprehensive framework that enforces consistency in LLM-generated knowledge graphs through schema guidance. This framework will include intelligent schema extraction techniques, pattern-based extraction mechanisms, and template-based prompting strategies specifically designed to reduce hallucinations due to more structured graphs.

Deliverables

- A fully documented Python library implementing schema-guided extraction pipelines - A suite of optimized prompting templates demonstrating at least 40% improvement in extraction accuracy compared to unconstrained approaches - An evaluation report comparing the framework's performance across different domains, documenting consistency improvements and hallucination reduction

Budget

$25,000 USD

Success Criterion

The framework demonstrates consistent knowledge graph generation with less than 25% structural variability across multiple runs using identical inputs. Extraction accuracy improves by at least 25% compared to baseline unconstrained LLM approaches when validated against human-annotated ground truth data.

Milestone 2 - Schema Learning and Ontology Population Tools

Description

We will create automated tools for schema learning and ontology population that transform unstructured text into consistent knowledge structures with minimal human intervention. These tools will combine traditional NLP approaches with LLM capabilities to automatically induce domain schemas and populate ontologies from text corpora. The implementation will prioritize usability for domain experts without requiring specialized knowledge engineering skills.

Deliverables

- A toolset for automated schema induction from domain-specific text with comprehensive API documentation - A hybrid ontology learning system integrating statistical NLP and LLM-based semantic understanding, supporting standard ontology formats

Budget

$27,500 USD

Success Criterion

The tools successfully generate viable domain schemas from text corpora with at least 80% precision when evaluated against expert-created schemas in three distinct domains. Ontology population accuracy reaches at least 65% for entity classification and relationship extraction when compared to gold standard datasets.

Milestone 3 - Reproducibility Validation Framework & Benchmarks

Description

We will establish a comprehensive validation framework and benchmark suite for evaluating reproducibility in knowledge graph generation. This framework will objectively measure consistency, accuracy, and structural fidelity across multiple runs of the generation process. The benchmarks will cover diverse domains and text types to ensure broad applicability and will include both structural and semantic evaluation metrics.

Deliverables

- A validation framework implementing multiple graph comparison techniques (structural, embedding-based, and semantic equivalence) - A benchmark dataset comprising varied domains with gold standard knowledge graphs for reproducibility evaluation - A detailed methodology document and evaluation protocol for standardized testing of knowledge graph generation reproducibility

Budget

$27,500 USD

Success Criterion

The validation framework provides quantifiable metrics that correlate with human judgments of knowledge graph quality and consistency. The benchmark suite demonstrates discriminative power by effectively distinguishing between systems with different levels of reproducibility and consistency.

Milestone 4 - Integration with MeTTa/MORK Systems

Description

We will develop seamless integration between our reproducible knowledge graph generation tools for the MeTTa/MORK symbolic reasoning systems. The implementation will preserve semantic fidelity throughout the pipeline while ensuring compatibility with existing MeTTa/MORK workflows.

Deliverables

- Integration adapters for direct incorporation of extracted knowledge into reasoning workflows - A demonstration system showcasing end-to-end knowledge extraction and reasoning across at least two complex domains

Budget

$7,500 USD

Success Criterion

Knowledge extracted via our framework can be successfully utilized in MeTTa/MORK reasoning tasks with equivalent semantic accuracy to manually encoded knowledge.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

Welcome to our website!

Nice to meet you! If you have any question about our services, feel free to contact us.