Plantagenet: A Guidance System for LLMs

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Expert Rating 4.0
Luke Mahoney (MLabs)
Project Owner

Plantagenet: A Guidance System for LLMs

Expert Rating

4.0

Overview

Probabilistic Logic Networks (PLNs) are networks that define logical relationships and allow for induction on those relationships. MeTTa, being a programming language with strong inherent ability to define such relationships in its Atomspaces, is perfectly suited for representing these networks. We propose a scheme for having PLNs guide large language models (LLMs) to alleviate the “hallucinatory” overconfidence of LLMs, and to guide it through explanations of reasoning, while maintaining a grasp of the inherent uncertainties in the train of inference. We propose to use chain-of-thought, exemplar few-shot learning, and hybrid LLM engineering to accomplish this guidance.

RFP Guidelines

PLN guidance to LLMs

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $80,000 USD
  • Proposals 3
  • Awarded Projects n/a
author-img
SingularityNET
Oct. 4, 2024

This RFP seeks proposals to explore how Probabilistic Logic Networks (PLN) can be used to provide guidance to LLMs. We are particularly interested in applying PLN to develop an alternative to graphRAG for augmenting LLM memory using Atomspace knowledge graphs.

Proposal Description

Company Name (if applicable)

MLabs LTD

Project details

Large Language Models (LLMs) offer shockingly realistic text completion and conversational abilities. One serious weakness of these models, however, is that they seem to lose track of relationships between pieces of information fed to it. The reason for this lies in the way this is currently done: every time a new query is sent to the LLM, the entire conversation up until that point is also fed into the model so that the response can include the context of what has already been said. LLMs, however, have a limit to the amount of information that can be fed to them at any given time before it ceases to be able to collate all of the data, and the answers begin to lack cohesion.

This problem is further exacerbated by the statistical nature of the auto-regressive text generation. Although the contextual properties of the training data are adequately captured by the LLM, the generation process is a probabilistically weighted random walk in token space. As the generated text gets longer the expected deviation from the thread of reasoning inevitably increases.

While LLMs are sometimes shown to be “reasoning” about their query when producing output, the process is based on data not rules. As such it can at best be thought of as informal reasoning - the kind of reasoning based on experience and intuition.

To provide LLMs with the kind of formal reasoning used by PLNs we can take several approaches. The most complicated is a tightly coupled LLM/PLN-hybrid. While this is a laudable long-term goal, we currently see this approach as being practically unsound (the gulf between symbolic reasoning and sub-symbolic language modeling is perhaps too large for this project to bridge) and unnecessarily complex in its data and computational needs.

Instead we follow the work of Brown et al., and propose a one-shot or few-shot approach which enables the PLN to guide the LLM through carefully engineered LLM prompts. The original prompt engineering work focused on carefully constructed queries, but in their 2022 paper Wei et al. demonstrate that LLMs prompted with chain-of-thought (CoT) exemplars also exhibited few-shot improvements in their apparent ability to reason.

Our proposal is to evaluate the use of query-engineering, and CoT exemplar engineering as a means of having PLNs written in MeTTa guide the text generation from LLMs. As such, our work naturally falls into three main parts:

  • use the PLN to generate the reasoning steps, and prompt the LLM to explain each step using query-engineering
  • use the PLN to generate a set of exemplars for CoT few-shot learning, and have the LLM repeat and explain its own reasoning
  • hybrid of CoT few-shot learning and query-engineering, where the LLM is predisposed to output the intended explanations, but is also prompted along the way

We would like to be able to demonstrate how well these approaches work on a number of domains, where each domain has its own set of axioms and relationships, and we can control both the complexity and novelty of the domain. For this we will need PLN Atomspaces for each domain. Some of the standard reasoning tests (such as the GSM8K math reasoning test of Cobbe et al) have a suitably simple domain, but rely heavily on the ability of the LLM to parse the questions. Here we are also interested in the ability of the PLN/LLM system to generate useful conclusions given a body of evidence without the answer-form being explicitly identified.

We therefore suggest that we focus, initially at least, on logic grid puzzles. Logic-grid puzzles are well suited to logical reasoning, and they are often deliberately constructed to make the correct line-of-reasoning be non-obvious to humans, as well as being taxing for LLMs (see Tyagi et al for example).

Having determined the best method for using PLNs to guide LLMs in reasoning heavy tasks, we will use our findings to test the system on a suitable benchmark which is quite different from the logic-grid puzzles used for the main investigation. Since the domain of the benchmark will require a suitable Atomspace we will decide on the benchmark after a period of investigation.

All software, datasets, and findings will be made available to the SingularityNET community through our milestone deliverables, and we will collect our insights together and produce a research paper detailing our findings.

Team

Nathaniel Lane, Senior Data Scientist - Project leader and AI Investigator

Nathaniel Lane graduated summa cum laude from Colorado School of Mines with a degree in Computer Science. He went on to get a Master’s in Computer Science from Montana State University by researching how neural networks can be used to predict whether a given peptide has anti-cancer properties. Since then, he has worked at MLabs on a variety of projects, including “Memory-augmented LLMs: Retrieving Information using a Gated Encoder LLM (RIGEL)” from RFP round 3. Nate will lead the project and take responsibility for many of the technical tasks.

Banashree Sarma, AI Engineer - Project Scientist and LLM Engineer

Dr Sarma received her Doctorate in Data Science and Artificial Intelligence from the Indian Institute of Technology in Madras, India. She specializes in natural language processing and reinforcement learning. She has built context embedding models, and LLMs for a number of languages, and is currently a member of the NLP team at MLabs AI, as well as handling our reinforcement learning activities. She will assist Nate in the prompt engineering.

Farseen CK, Senior Software Architect - Project Scientist

Farseen is an accomplished Software Architect, known for his expertise in robust, and scalable software design. He possesses strong analytical skills in state space analysis, graph topology analysis, and multi-level Monte Carlo analysis. Farseen has worked closely with Nate on Rigel, and has a deep understanding of LLMs. He will assist Nate with certain aspects of the LLM engineering.

Mark Bedworth PhD FSS, Chief Scientist and AI Visionary - Project Consultant

Dr Bedworth has over 40 years experience at the forefront of AI. He co-founded the International Society for Information Fusion, and served as its Vice President in its second year. Two of his patents have been acquired by Apple, and form a core element of the Siri speech recognition system. He is currently co-founder and Chief Scientist at MLabs AI, leading the team developing radically new approaches to Deep Learning, self-directed knowledge acquisition and Artificial General Intelligence. He will act as a sounding board and assist Nate in the week-by-week technical aspects of the project.

Dzmitry Shuiski, Assistant Data Scientist - Project Scientist

Dzmitry is an experienced software engineer with a deep passion for data in all its guises. With a strong foundation in a number of programming languages, he excels at building sophisticated algorithms, leveraging functional programming techniques, and deploying complex data models to solve real-world problems. Coming from a background of iOS App Development, he is adept at leveraging cutting-edge tools and frameworks to consistently create data-driven solutions that drive innovation. He is also pursuing a degree in Software Engineering and Management at Graz University of Technology to deepen his expertise in core engineering disciplines. He will assist Nate in the day-to-day data science of the project.

Challenges

Integration between PLNs and LLMs is a frontier of research that has not been explored much thus far. As such, we expect to encounter some technical challenges in this project. We will describe these in the following paragraphs.

The first is in generating logic-grid puzzles into Atomspaces. As it stands, MeTTa is a new and exciting language, but it does not have many example programs. Therefore, while the team has some experience, we will need to spend some time becoming more proficient in MeTTa before we can really dive into the project. Once we have generated the puzzles in an easy to parse format, we will need to establish a suitable benchmark dataset ourselves and convert it into an Atomspace.

We will also have to take care in how we engineer our prompts. We will need to do so in such a way that the information is succinctly fed to the LLM while also giving room for the query we actually want answered, because LLMs can only accept prompts of a certain size. We will also need to ensure the prompt is fashioned in a way that the relevant information is clearly laid out for the LLM and that the LLM can explain the logic behind its answers.

Finally, we will need to establish exemplar puzzle-query pairs that will help show the strengths and shortcomings of this system. These will not only include examples where an LLM struggles but our system succeeds, but also ones that show the limitations of Plantagenet. This will help show directions for future research.

We believe that these challenges are somewhat mitigated by the seniority of the selected team, who between them have many decades of experience in Machine Learning R&D. Furthermore we intend to hold regular internal troubleshooting workshops, as well as capitalizing on the wider AI community from SingularityNET. Overall we feel these challenges present a minor risk to our suggested milestones, which we are confident we can overcome within time and to budget.

Open Source Licensing

MIT - Massachusetts Institute of Technology License

Links and references

References

 

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    6

  • Total Budget

    $80,000 USD

  • Last Updated

    6 Dec 2024

Milestone 1 - PLN Atomspaces for Logic Grid Puzzles

Description

In this first milestone we will write a generator for logic grid puzzles which assembles them into standardized descriptions and corresponding PLN Atomspaces. Logic grid puzzles are a simple to understand problem domain are easy to vary in complexity and can be solved using PLNs. In order to baseline the PLN part of the guidance process we will create a dataset of logic grid puzzles and check that the associated PLNs are able to solve them.

Deliverables

Software to generate the logic grid puzzles and turn them into PLN Atomspaces. A dataset of such puzzles and associated Atomspaces will be made available for others to work with. Experimental results showing that PLNs can solve such puzzles and that the reasoning steps can be adequately captured.

Budget

$8,000 USD

Success Criterion

1. The program is able to generate logic grid puzzles 2. The program is able to convert them into MeTTa Atomspaces

Milestone 2 - Baseline graphRAG Output for Reasoning Tasks

Description

In this milestone we will feed the logic grid puzzles from the dataset produced in Milestone 1 to graphRAG. We will then use the original problem statement of the puzzle to prompt a Llama instance augmented by graphRAG. We are interested in determining: - at what complexity was the retrieval-augmented LLM able to solve the puzzle? - did the LLM combined with the graphRAG instance generate any reasoning steps in reaching its answer? The behavior of the LLM during these tests will form the baseline for our forthcoming prompt engineering work. During this experimental phase we will settle on a set of metrics which can be objectively recovered from the query-reponse sequences. These will include but are not limited to; solution accuracy solution efficiency elucidation of optimal reasoning steps and propensity to make inappropriate steps.

Deliverables

The raw graph store of graphRAG the raw query and output data from the LLMs (including the retrieved data from graphRAG) and a summary of our findings and insights regarding which presentation approaches yield best results.

Budget

$8,000 USD

Success Criterion

1. We can feed the raw text of the logic grid puzzles into graphRAG 2. We can augment queries to the LLM with data from graphRAG 3. We are able to establish a baseline to compare all other experiments with

Milestone 3 - Chain-of-Thought Prompting for LLM Guidance

Description

We will use the PLNs to generate a line of reasoning and develop an automatic process for engineering Chain-of-Thought prompts for the LLM. We will experiment with the following methods for giving the LLM information from the PLN: - use the output of the PLN directly as input to the LLM - transliterate the output of the PLN into a prompt using a set of transformation rules The LLM is prompted at each stage by the requisite step in the reasoning process. In this set of tasks we are trying to discover the optimum lingua franca for communicating PLN reasoning to LLMs.

Deliverables

The raw query and output data from the LLMs and a summary of our findings and insights regarding which presentation approaches yield best results.

Budget

$19,000 USD

Success Criterion

The LLM is able to generate meaningful responses based on the query augmented with data from the PLN

Milestone 4 - Few-Shot Exemplar Prompting for LLM Guidance

Description

In this milestone we will use the PLN to generate examples of reasoning using the logic-grid Atomspaces. These exemplars are fed into the LLM as examples of query/answer pairs and the context is used to allow the LLM to perform few-shot learning on the domain. The LLM is then presented with the test puzzle and its own line of reasoning is examined.

Deliverables

The raw query and output data from the LLMs and a summary of our findings and insights regarding which presentation approaches yield best results.

Budget

$18,000 USD

Success Criterion

1. We are able to generate solid exemplars: some that are simple, some that are tricky, and some that yield surprising results from the LLM (i.e., it struggles with an easy prompt or succeeds with a difficult one) 2. We are able to determine how well the LLM was able to explain its own reasoning (Will be discussed at length in the paper in milestone 6)

Milestone 5 - Hybrid CoT/Exemplar Prompting for LLM Guidance

Description

We make a hybrid of the exemplar and chain-of-thought approaches by giving the LLM a set of exemplars for each step in the chain-of-thought. We are here determining a good balance between the few-shot learning from exemplars and the step by step prompting from chain-of-thought. Our aim is to provide an interface definition for enabling PLNs to provide guidance to LLMs.

Deliverables

The raw query and output data from the LLMs and a summary of our findings and insights regarding which presentation approaches yield best results.

Budget

$19,000 USD

Success Criterion

We are able to determine which approach yielded the best results

Milestone 6 - Research Paper with Findings

Description

We examine the standard benchmarks for LLM reasoning tests and determine which (if any) correspond to existing (or easy to develop) Atomspaces for PLNs. Once we have settled on a benchmark we will evaluate the CoT and exemplar approaches to LLM guidance. We will collate all of our findings and assemble them into a research paper for dissemination to the SingularityNET community. We will further explore the potential research directions for CoT exemplar and hybrid prompt engineering for PLN guidance to LLMs and perform a preliminary survey of how PLNs and LLMs might be more tightly integrated. Furthermore we will recommend ways in which PLNs could be integrated into systems beyond LLMs. We expect this final task to be predominantly paper-based research rather than experimental.

Deliverables

Benchmark dataset domain Atomspace and research paper.

Budget

$8,000 USD

Success Criterion

We are able to analyze our results and synthesize meaningful conclusions that help lay the groundwork for future AGI research

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

4.0

  • Feasibility 4.2
  • Desirabilty 3.7
  • Usefulness 4.2
  • Expert Review 1

    Overall

    2.0

    • Compliance with RFP requirements 2.0
    • Solution details and team expertise 2.0
    • Value for money 2.0
    At least they understand what PLN and MeTTa are! However admittedly propose a shallow approach propose a one-shot or few-shot approach which enables the PLN to guide the LLM through carefully engineered LLM prompts.

    Not recommend. Mahoney represents a team from MLABS which I could not find any information about and Nathaniel Lane, who is Senior Data Scientist will be leading the project. This is the best proposal out of three because at least people recognize the magnitude of the PLN & LLM integration as well as admits that tight integration will require considerably more time and budget. Reading the proposal gives me the impression that Nathaniel has a knowledge of LLMs, how they work as well as recognizes what PLN represents on a very, very basic level. This is perfectly fine since we cannot require outside people to know the details of OpenCog. However, proposal admittedly notes that a tightly coupled LLM/PLN-hybrid (what we look for) is the most complicated approach and is not a feasible solution because it is “practically unsound” especially within the length/cost of the project. My main reason for the 2-star feedback is that the team proposes a shallow approach of using PLN for prompt engineering and creating an Atomspace for each independent domain which will lack the cross domain knowledge RFP aims for. Also, the team is unsure about benchmarking and testing domains and is going to decide about use cases once the work gives promising results. Additionally, the team’s proficiency in MeTTa and familiarity with OpenCog is at the bare minimum if at all, which will require couple of months to gain some MeTTa proficiency and PLN understanding.

  • Expert Review 2

    Overall

    3.0

    • Compliance with RFP requirements 3.0
    • Solution details and team expertise 3.0
    • Value for money 3.0
    LLM explanations grounded in PLN reasoning

    With this approach the LLM can act as an explanation generator that stays close to the PLN inference results. Hence this approach is sound in principle even though the proposals does not go into crucial details like how to obtain PLN knowledge and to embed in a way that the LLM can access it and refer to relevant entries.

  • Expert Review 3

    Overall

    5.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 4.0
    • Value for money 5.0
    Solid proposal, using logical-grid puzzle as test cases sounds good to me.

    The proposal is well written and sensible. It is obvious the team has a good grasp of the task at hand. As currently formulated the project would rely on a readily available version of PLN, which I'm sorry to say is not the case at the moment (which is not to say it couldn't come soon, but is not a guaranty). However, a general purpose chainer is available and I believe is sufficiently mature to be used with an adequate logic for the logic-grid puzzle test cases aforementioned. Thus, I believe that the current state of PLN is not an obstacle to the realizability of that proposal.

  • Expert Review 4

    Overall

    4.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 4.0
    • Value for money 5.0
    It's a solid and sensible approach -- basically using PLN inferences to help guide/train LLM chains of thought and inferences

    The approach described totally makes sense and fits the requirements of the RFP. It's not terribly adventurous and seems almost straightforward to try out, but that's not necessarily a bad thing if it works.... One is tempted to add more feedback though, wherein the LLM guides PLN and vice versa, so maybe the two can synergetically evolve a collective superior form of inference?

  • Expert Review 5

    Overall

    5.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 5.0
    • Value for money 5.0

    A solid, well-thought-through, and methodological approach in which the submitters first provide a concise analysis of the weaknesses of current LLM models. From there they suggest the use query-engineering, and Chain of Thought exemplar engineering, to directly address those weaknesses. Clearly understanding of the problem leads to clear solutions. While less ambitious than other ideas, there is high likelihood there proposal will lead to clear, solid improvements in LLM results.

  • Expert Review 6

    Overall

    5.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 4.0
    • Value for money 5.0
    Strong proposal directly focused on PLN

    Very solid proposal. Focuses directly on PLN-guided reasoning for LLMs, aligning well with the RFP. Innovative use of chain-of-thought prompts and PLN integration for addressing LLM reasoning and hallucination issues. Ambitious but acknowledges risks (e.g., MeTTa proficiency and benchmark development). Clearer focus on Atomspace enrichment would strengthen the proposal.

feedback_icon