PSL-Augmented Atomspace Knowledge Graphs

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Expert Rating 3.6
Mayank Kejriwal
Project Owner

PSL-Augmented Atomspace Knowledge Graphs

Expert Rating

3.6

Overview

This project proposes using Probabilistic Soft Logic (PSL) to enrich Atomspace knowledge graphs (KGs) with probabilistically weighted links. PSL uses weighted, first-order logic rules that define probabilistic dependencies between facts, and is well suited to KG-centric tasks like collective classification, entity resolution, and link prediction. By applying PSL in batch mode, the KG will evolve to contain more informative nodes and edges, which we will empirically show can be used to support improved memory and retrieval in LLMs. Additionally, we will convincingly demonstrate that it is a more accurate, probabilistically weighted alternative to graphRAG, and is also extremely efficient.

RFP Guidelines

PLN guidance to LLMs

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $80,000 USD
  • Proposals 3
  • Awarded Projects n/a
author-img
SingularityNET
Oct. 4, 2024

This RFP seeks proposals to explore how Probabilistic Logic Networks (PLN) can be used to provide guidance to LLMs. We are particularly interested in applying PLN to develop an alternative to graphRAG for augmenting LLM memory using Atomspace knowledge graphs.

Proposal Description

Company Name (if applicable)

University of Southern California

Project details

Probabilistic Soft Logic (PSL) is a highly scalable, declarative framework used for probabilistic reasoning over large sets of relational data. It provides a foundation for reasoning over continuous, soft-truth values between 0 and 1, representing varying degrees of certainty in real-world information. Unlike traditional logic systems where truth values are binary (true or false), PSL assigns a "soft" truth value to each logical proposition. This capability allows PSL to naturally handle uncertainty, incomplete data, and noisy inputs. PSL is particularly well-suited for complex relational domains like knowledge graphs, entity resolution, and link prediction tasks.

PSL’s strength lies in its ability to combine the efficiency of convex optimization with the flexibility of first-order logic. A PSL model is composed of weighted, first-order logic rules that describe the relationships between different entities. These weights represent the importance or confidence of each rule, and during inference, PSL uses these weights to compute the most probable truth values for unknown relationships. Importantly, the inference process is highly scalable due to PSL’s convex optimization formulation, enabling it to process millions of facts efficiently. This makes PSL an excellent choice for KG enrichment, as it can handle the scale and complexity of real-world datasets, providing an efficient way to infer missing information and correct inconsistencies.

In this proposal, we will use PSL as a probabilistic logic framework to enhance Atomspace knowledge graphs using batch-mode PLN. By probabilistically enriching these graphs, we aim to improve the memory and reasoning capabilities of large language models (LLMs). This approach will demonstrate the utility and efficieny of PSL in improving the retrieval accuracy of graph-based memory systems, particularly in comparison to existing methods like graphRAG.

Before describing the specific steps, and detailing corresponding milestones in the next section, we provide some details on why PSL is such an efficient framework, especially compared to alternative frameworks such as Markov Logic Networks (MLNs). MLNs which handle uncertainty with probabilistic rules over binary truth values, but they face significant computational challenges because inference in MLNs involves solving large combinatorial optimization problems. This often results in slow performance, especially on large datasets.

In contrast, PSL uses continuous-valued truth values rather than binary ones, allowing for more flexible reasoning under uncertainty. This continuous nature simplifies the optimization problem, transforming it into a convex optimization task rather than a combinatorial one, which makes it highly scalable and computationally efficient. PSL’s inference process, which relies on convex optimization, ensures that the system can efficiently handle millions of data points without the prohibitive complexity often encountered with MLNs.

Additionally, PSL allows for soft truth values to represent the varying confidence in each rule, making it particularly useful in real-world applications like knowledge graphs, where data can be noisy and incomplete. This flexibility, combined with the ability to define probabilistic dependencies between facts via first-order logic rules, makes PSL a perfect fit for enhancing LLM memory systems that rely on enriched, probabilistic knowledge graphs. Its scalability and efficiency allow for the processing of large, relational datasets, critical for systems like Atomspace that manage complex knowledge graphs.

The first step of the project involves setting up the architectural foundation for the integration of PSL into Atomspace within the OpenCog Hyperon framework. This stage will focus on designing the pipeline through which probabilistic reasoning will operate on Atomspace knowledge graphs. We will define how batch-mode processing of PLN will function within this architecture and how it interfaces with MeTTa for seamless integration into the larger system. The initial architecture will also outline how graphRAG comparison benchmarks will be set up to evaluate the system’s performance.

Key aspects include the definition of interfaces between PSL, Atomspace, and the MeTTa scripting language, ensuring compatibility with the OpenCog Hyperon framework. The architecture must meet functional requirements, such as enabling scalable batch processing, and non-functional requirements like moderate dataset handling and extensibility for future improvements.

Next, we will focus on the development and implementation of the batch-mode PLN system, applying PSL to enrich Atomspace knowledge graphs. Using the rules and methods from the PSL framework, we will probabilistically enrich knowledge graphs, creating a more informative structure. This enriched graph will include nodes with weighted relations that support memory retrieval tasks in LLMs. By focusing on batch mode rather than real-time inference, the system will operate in the background to continuously improve the graph structure.

The key challenge here is to ensure that the PSL-enriched knowledge graph can improve the retrieval accuracy of LLM memory while working within the constraints of Atomspace and the OpenCog Hyperon framework. The enriched graph will be benchmarked against traditional graphRAG systems, with the goal of demonstrating superior performance in retrieving relevant information.

In the third phase, we will test the PSL-enhanced batch-mode system to evaluate its performance across several key metrics. These tests will focus on reasoning accuracy, retrieval speed, and overall memory improvement in LLMs. The system’s probabilistic reasoning capabilities will be compared against traditional graphRAG systems, with a focus on improvements in retrieval and reasoning.

The tests will address both functional requirements, such as quantifiable improvements in LLM reasoning, and non-functional requirements, such as efficient processing of moderate-sized datasets. The results will be used to refine the system and guide further optimizations in subsequent milestones.

After testing the batch-mode PSL system, we will conduct a comparative study between the PSL-enhanced knowledge graphs and traditional graphRAG methods. This milestone involves further optimization of the system, with a focus on improving probabilistic reasoning accuracy and the quality of memory retrieval in LLMs. Optimization will focus on fine-tuning the PSL rules, adjusting the weights assigned to different relations, and improving the overall system’s efficiency in batch-mode processing.

We will also explore hybrid approaches that may combine batch-mode processing with elements of real-time PLN queries, as suggested by the RFP. The goal is to identify the most efficient and effective strategy for leveraging PSL in LLM memory systems.

The final milestone involves completing the project by producing a fully functioning system, accompanied by comprehensive documentation. This will include user guides, technical documentation of the architecture, and instructions for reproducing the results and extending the system for future applications. We will also deliver visualizations and reports showcasing how PSL-enhanced memory and reasoning processes function in practice.

The documentation will highlight both functional and non-functional achievements, such as scalability, moderate dataset handling, and system extensibility. The final deliverable will be a ready-to-use system integrated with OpenCog Hyperon, capable of serving as a benchmark for future research in probabilistic logic and LLM memory enhancement.

The team proposed here is eminently qualified to complete these ambitious tasks within the stated budget and timeframe. Prof. Mayank Kejriwal, who is leading the project, is an expert on knowledge graphs, having applied them to projects funded by DARPA on problems ranging from human trafficking to financial fraud. These are difficult problems requiring advanced knowledge representation. Kejriwal has also presented in the AGI conference, and his recent papers have been applying LLMs to problems like healthcare and uncertainty estimation. His work has appeared in numerous press outlets like The BBC and CNN Indonesia, and published in prestigious journals like Nature Machine Intelligence. He also has extensive systems experience: he runs a company called ACI Solutions LLC, which provide software consulting services to major corporations, by applying AGI principles to practical problems. He has taught advanced courses on text analytics and KGs, and written four books on the subject. His PhD students and engineers, who are part of his group, have applied experience in combining KGs and LLMs, allowing the team to get off to a fast start to the project. Additionally, resources available to Kejriwal's group at USC will also be an asset in implementing this project.

Open Source Licensing

MIT - Massachusetts Institute of Technology License

We will release all outputs of this work under a permissive MIT license, with full documentation. 

Links and references

Principal Investigator's (PI) group website: https://aicomplex.github.io/ 

PI's Substack: https://aiscientist.substack.com/ 

PI's MIT Press Textbook on knowledge graphs: https://mitpress.mit.edu/9780262045094/knowledge-graphs/ 

Primer on Probabilistic Soft Logic: https://psl.linqs.org/ 

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    5

  • Total Budget

    $75,000 USD

  • Last Updated

    24 Oct 2024

Milestone 1 - Project Setup and Initial Architecture Design

Description

Define the system architecture to integrate Probabilistic Soft Logic (PSL) into the Atomspace knowledge graph within the OpenCog Hyperon framework. This includes the interface between PLN and MeTTa for batch-mode processing and compliance with functional and non-functional requirements including architecture scalability and documentation needs.

Deliverables

Architecture design document detailing PSL integration with OpenCog Hyperon MeTTa scripts and batch-mode setup.

Budget

$10,000 USD

Milestone 2 - Batch Mode PLN Implementation on Atomspace

Description

Develop and implement the batch-mode PSL system to enrich Atomspace knowledge graphs ensuring it supports graphRAG comparisons. Key focus is on enhancing knowledge graph structure including probabilistically weighted nodes and edges to improve LLM memory retrieval. PSL programs will be written to be compatible with Atomspace.

Deliverables

Working code for batch-mode PLN operations on Atomspace integrated with MeTTa tested for initial graphRAG comparisons.

Budget

$20,000 USD

Milestone 3 - System Testing and Performance Evaluation

Description

Test the PSL-enhanced batch-mode system for memory retrieval reasoning accuracy and compare its performance with traditional graphRAG. This includes satisfying non-functional requirements like moderate dataset handling and system performance in batch mode.

Deliverables

Testing report with empirical results comparing PSL-enhanced system against graphRAG in reasoning and retrieval performance. The empirical results will be reported in a scientific manner with complete performance profiles robustness and sensitivity analyses and statistical significance.

Budget

$15,000 USD

Milestone 4 - Comparative Study and System Optimization

Description

Conduct a detailed comparative study between PSL-enhanced knowledge graphs and graphRAG systems. Optimize PSL processes to improve reasoning accuracy focusing on processing time efficiency and retrieval quality in line with non-functional requirements.

Deliverables

Comparative analysis report performance tuning results and final metrics of system optimization; additionally codebase will be refined and stress-tested much more significantly during this period.

Budget

$15,000 USD

Milestone 5 - Final Documentation and Deliverables

Description

Complete final system implementation with detailed documentation of PSL integration MeTTa operations and compliance with OpenCog Hyperon architecture. Provide clear user and developer documentation demonstrating system scalability and reproducibility.

Deliverables

Final working system comprehensive documentation visualizations of PLN-enhanced memory and reasoning processes and final presentation of outcomes.

Budget

$15,000 USD

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

3.6

  • Compliance with RFP requirements 3.3
  • Solution details and team expertise 3.3
  • Value for money 3.8
  • Expert Review 1

    Overall

    1.0

    • Compliance with RFP requirements 1.0
    • Solution details and team expertise 1.0
    • Value for money 0.0
    Completely out of scope! Not even wants to use PLN!

    NOT Recommend! Mayank proposes to use his own Probabilistic Soft Logic (PSL) to enrich Atomspace knowledge graphs (KGs) with probabilistically weighted links. Not to say it does not have good ideas however it is completely out of scope for the particular RFP. He Does not understand the magnitude if PLN being a part of complex semantic reasoning system and thus proposes using PSL instead.

  • Expert Review 2

    Overall

    1.0

    • Compliance with RFP requirements 1.0
    • Solution details and team expertise 2.0
    • Value for money 0.0
    PSL not capturing crucial PLN design features

    Atomspace entries often cannot easily be complemented with weighted probabilistic links unless it is already PLN knowledge embedded in the atomspace or at least some other form of knowledge with involved probability values. Overall, the approach does not seem to be thought through with technical details about PLN in mind, and furthermore seems to aim for a single-valued uncertainty measure in Probabilistic Soft Logic (PSL) which is contrary to the PLN design. Also PSL is outside of the scope of this RFP. Additionally, the claim is not convincing that the outcome of this 75K USD project will be more accurate than Microsoft's graphRAG.

  • Expert Review 3

    Overall

    5.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 5.0
    • Value for money 0.0
    Very good proposal, PLN is somewhat replaced by PSL, but I don't see a problem with that.

    The proposal is well written, detailed and gives a sense of confidence on the part of the authors. They suggests to use PSL as the central inference engine. Even though my view on that sort of logic is that it is inadequate for purely data driven learning compared to PLN or NAL, in that context (helping LLMs to reason), I think it is fine, because the underlying model is obtained by an LLM which already knows abstract patterns about the environment, as opposed to a situation where say an agent has to extract these patterns solely from observed data (where PLN or NAL shine). A few times the authors are trying to relate that to PLN, I frankly view that as optional, the proposal is already ambitious enough. PLN is going through some overhaul anyway, so the timing might not be ideal. A question I have is how Hyperon is going to be leveraged if inference is delegated to an auxiliary system. I suppose though, by representing knowledge in the common language MeTTa, including what is inferred by PSL, it opens the door for synergies with the rest of Hyperon (PLN included), which by itself constitutes an important lever.

  • Expert Review 4

    Overall

    5.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 5.0
    • Value for money 0.0
    A well written up proposal describing an apparently sensible technical approach via someone with appropriate background

    PSL looks like a viable approach to scalably leveraging uncertain KGs to guide LLMs, more incisively than GraphRAG but more simple and obvious to scale than logical chainers. Nice one. And the proposer's background looks strong.

  • Expert Review 5

    Overall

    5.0

    • Compliance with RFP requirements 4.0
    • Solution details and team expertise 5.0
    • Value for money 0.0

    An interesting proposal to enrich Atomspace knowledge graphs with Probabilisitic Soft Logic using batch-mode PLN. Since PSL can quickly solve inference problems, based upon Markov random fields, using convex optimization, it seems it could be used for initializing graph probabilities for certain sets of problems.

  • Expert Review 6

    Overall

    5.0

    • Compliance with RFP requirements 4.0
    • Solution details and team expertise 5.0
    • Value for money 0.0
    Strong proposal introducing probabilistic soft logic

    Strong proposal. Compelling use of PSL for KG enrichment, scalable reasoning, and benchmarking vs. graphRAG. Addresses key goals of improving LLM memory and retrieval and aligns sufficiently with the PLN focus of the RFP. Good clear milestones.

feedback_icon