Unsupervised Knowledge Graph Construction for AGI

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
user-profile-img
Patrick Nercessian
Project Owner

Unsupervised Knowledge Graph Construction for AGI

Expert Rating

n/a

Overview

The goal of this project is to create a framework that can construct and expand knowledge graphs on the Hyperon platform and, secondly, utilize these knowledge graphs during large language model (LLM) inference to improve factual grounding during responses. We plan to construct a knowledge graph from an initial knowledge base, such as Wikipedia, and expand upon it via an iterative intelligent search into more detailed source materials, such as textbooks and research papers. The expanded graph will serve as a source of reliable truth to reduce the rate of LLM hallucinations during AGI reasoning tasks requiring multi-hop retrieval and other forms of semantically complex question answering.

RFP Guidelines

Advanced knowledge graph tooling for AGI systems

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $350,000 USD
  • Proposals 39
  • Awarded Projects 5
author-img
SingularityNET
Apr. 16, 2025

This RFP seeks the development of advanced tools and techniques for interfacing with, refining, and evaluating knowledge graphs that support reasoning in AGI systems. Projects may target any part of the graph lifecycle — from extraction to refinement to benchmarking — and should optionally support symbolic reasoning within the OpenCog Hyperon framework, including compatibility with the MeTTa language and MORK knowledge graph. Bids are expected to range from $10,000 - $200,000.

Proposal Description

Our Team

Vectorial consists of several software, AI, and ML engineers. For over 2 years, we have implemented advanced LLM and ML solutions (research, features,  and fully fledged products) for our customers.

We are currently working on DeepFunding RFPs for “Evolving DNN Architectures” and “Utilize LLMs for modeling within MOSES” and have made considerable early progress. Our team has additional bandwidth and is excited to take on this RFP as well.

Company Name (if applicable)

Vectorial

Project details

The goal of this project is to create a framework that can construct and expand knowledge graphs on the Hyperon platform and, secondly, utilize these knowledge graphs during large language model (LLM) inference to improve factual grounding during responses. We plan to construct a knowledge graph from an initial knowledge base, such as Wikipedia, and expand upon it via an iterative intelligent search into more detailed source materials, such as textbooks and research papers. The expanded graph will serve as a source of reliable truth to reduce the rate of LLM hallucinations during AGI reasoning tasks requiring multi-hop retrieval and other forms of semantically complex question answering.

We plan to begin with narrower domains, such as artificial intelligence and machine learning, to prepare for later expansion into broader categories. While the overall technique will remain domain independent for testing and initial development, we will meet memory pressure and capacity constraints using narrow example domains as proofs of concept.

For example, we can begin at the Wikipedia page for Machine Learning, using it to instantiate a representative node to serve as the initial anchor for a knowledge graph within MORK. Then, we create a series of edges, annotated with perceived semantic relationships within the anchoring document (e.g. links to other Wikipedia pages that exist within the Machine Learning page). Non-machine learning related information can be filtered out by asking an LLM whether or not the content contains relevant information to machine learning. This process will be performed recursively until all information on Wikipedia that relates to machine learning is captured.

After this first stage, we can begin incorporating less structured knowledge bases, such as arXiv papers or textbook datasets. By chunking and vector embedding these knowledge bases, we can use Agentic RAG (allowing LLMs to dynamically search for information) with ranked retrieval to find knowledge in the external data sources. All information identified as new will be dynamically added to the knowledge graph. We will use LLMs to determine if there are any facts present in the queried information sources which are not present in the current knowledge graph, by allowing the LLM agent to “walk” the existing graph in search of existing information. We intend to implement this walk using MeTTa to allow the agent to query the knowledge representation database within MORK programmatically, and then use the reasoning capacity of the LLM to interpret relationships and identify missing facts. This is also the interface by which the LLM will be able to update and reason with the knowledge graph.

Next, to demonstrate the effectiveness of Knowledge Graphs in improving factual grounding of LLM responses, we will conduct a series of experiments using common Q&A benchmarks, including the OpenAI SimpleQA benchmark, high-level standardized testing formats such as FE and GRE science exams, and multi-hop Q&A benchmarks. Our intention is to use benchmarks which cover a diverse set of topics and difficulty levels. We will compare performance across at least three model treatments: without access to a Knowledge Graph, with access to an initial Knowledge Graph, and with access to the complete Knowledge Graph enriched with external sources using our method. We plan to demonstrate the capability of our approach by generating a knowledge graph based on each topic in the benchmarks, and asking each LLM-knowledge-graph pair questions from its respective topic.

Finally, we can perform Reinforcement Learning to improve the LLM’s ability to search the knowledge graph, rewarding successful fact retrieval and subsequent reasoning. We can perform additional experiments such as fine-tuning the LLM with SFT or LoRA for domain-specific improvements. We will also experiment with a multi-agent approach that seeks to repeat the process across multiple nondeterministic walks, democratizing the response to achieve improved confidence in truthfulness by way of majority consensus.

Open Source Licensing

MIT - Massachusetts Institute of Technology License

Background & Experience

Our expertise originates from several advanced degrees in Data Science, Computer Science, Mathematics, and Machine Learning. Our team also has working domain experience at companies such as AWS, Oracle, Home Depot, and Fortune 50 financial corporations. Team members have completed a series of relevant projects regarding document retrieval, factual grounding, named entity & relationship extraction, multi-hop question answering, and graph/topology algorithms including search, reachability, metagraph application, and network flow.

At Vectorial, our team has collaborated for over 2 years to build several AI agent systems for our clients using advanced generative AI techniques. We are well-acquainted with the state-of-the-art methods used to improve LLMs and plan to use this experience to create a powerful knowledge graph framework that will scale along with AGI systems on the Hyperon platform.

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    5

  • Total Budget

    $200,000 USD

  • Last Updated

    19 May 2025

Milestone 1 - Research Plan

Description

Submit a thorough research plan outlining and detailing the approach and work to be done.

Deliverables

Detailed research plan with review of current literature agile breakdown of tasks with proposed timelines scoping document with functional/nonfunctional requirements and framework design report of selected benchmarks/evaluative sets.

Budget

$30,000 USD

Success Criterion

Completion of relevant documentation for scoping and project planning. Literature review prepares for deep understanding and manipulation of RFP-specific tooling and concepts, such as the MORK repository and current SOTA techniques for agentic knowledge graph interfacing. The literature review should also cover selection of applicable benchmark/test sets for evaluating reasoning capacity of the new framework from multiple dimensions.

Milestone 2 - Initial Development

Description

Complete initial development of the knowledge graph framework with the demonstrated ability to construct domain-specific knowledge graphs from a structured source and ability for the LLM agent to traverse the graph to find necessary information. Where possible we will integrate with MeTTa MORK and other Hyperon frameworks. Run initial benchmarks evaluating how our system improves on vanilla LLM systems.

Deliverables

Initial implementation of graph creation framework initial implementation of Knowledge Graph Inference Agent initial testing results and analysis against standard benchmarks

Budget

$40,000 USD

Success Criterion

Ability to generate knowledge graphs from a source concept node using an LLM agent, ability for an LLM agent to walk an existing knowledge graph, benchmarking and testing runs without errors

Milestone 3 - Extended Development

Description

Further development of the knowledge graph framework to include the ability to extend the knowledge graph with new information from unstructured sources. Where possible we will integrate with MeTTa MORK and other Hyperon frameworks. Run the same benchmarks to evaluate how the system improves on Milestone 2.

Deliverables

Further implementation of graph creation framework related testing results and analysis against standard benchmarks

Budget

$60,000 USD

Success Criterion

Ability to extend knowledge graphs from Milestone 2 with new knowledge extracted from unstructured data sources using LLM agents, benchmarking, and testing

Milestone 4 - Additional Development and Experiments

Description

Run additional experiments such as how fine-tuning reinforcement learning and multi-agent consensus improves capabilities. Evaluate on multiple benchmarks and perform ablations to determine distinct improvements of each subsystem.

Deliverables

Expanded implementation new benchmark results report including rationale for extended experiment prioritization or deferral based on previously conducted literature review

Budget

$40,000 USD

Success Criterion

Presentation of results of Milestone 2 experiments with augmented experimental groups

Milestone 5 - Final Report

Description

Submit all final materials as committed to in the grant proposal.

Deliverables

Final report with performance analysis code framework demonstration documentation.

Budget

$30,000 USD

Success Criterion

Report is able to communicate methods to a degree of detail wherein someone with proper qualifications and resources could repeat all experiments. Explanations and presentation of results are thorough such that anyone familiar with relevant AI topics would be able to comprehend them.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

Welcome to our website!

Nice to meet you! If you have any question about our services, feel free to contact us.