LLM + Genetic Optimizing Multi Objective Functions

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Expert Rating 4.0
Luke Mahoney (MLabs)
Project Owner

LLM + Genetic Optimizing Multi Objective Functions

Expert Rating

4.0

Overview

MOSES can solve genetic programming problems as well as optimize the search towards a solution, being able to find patterns within search spaces, which it can then exploit. However, it requires a single-objective scoring function, even for problems that prefer, or indeed need, multiple objectives. Constructing such a function is a non-trivial task with unpredictable effects on the search. We will use LLMs to combine multiple objective functions into a single objective function, using the circuit synthesis problem as a test bed. We will demonstrate the capabilities of this approach, first on general EAs, then using MOSES specifically, evaluating behavior on two different problem instances.

RFP Guidelines

Utilize LLMs for modeling within MOSES

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $150,000 USD
  • Proposals 10
  • Awarded Projects 1
author-img
SingularityNET
Oct. 9, 2024

This RFP invites proposals to explore the integration of LLMs into the MOSES evolutionary algorithm. Researchers can pursue one of several approaches, including generation modeling, fitness function learning, fitness estimation, investigation into domain-independent “cognitively motivated” fitness functions, or propose new innovative ways to leverage LLMs to enhance MOSES's capabilities within the OpenCog Hyperon framework.

Proposal Description

Company Name (if applicable)

MLabs LTD

Project details

MOSES is a capable framework for solving problems using genetic programming. More broadly, it is also able to optimize the search towards a solution itself, by being able to find consistencies or patterns within the search space, which it can then exploit. However, this power comes with a significant limitation: MOSES assumes a single-objective scoring function, with a tie-breaker based on representation size. While in theory, any problem can admit a single scoring function, more frequently, we have multiple objectives, frequently on different scales, with different (or even unknown) minima, and reconciling them into a single score is often difficult, if not outright impossible. Other EAs have been designed to avoid this problem by allowing first-class use of multiple objective (and scoring) functions, but it is not clear how to integrate these into MOSES.

Theoretically, converting a multi-objective problem (with multiple scoring functions) into a single-objective, single-scored problem is possible. However, several problems immediately become apparent:

  1. When collapsing multiple scores into a single score, we must consider weight by which any given score contributes relative the others. This is typically not known a priori, as for example some objectives may be mutually-exclusive with others. This is the weighting problem
  2. Some scoring functions output discrete values (natural numbers or integers), while others output continuous values. It is not clear how these two can be combined into a single function, which must choose to return one or the other kind of value. This is the continuity problem
  3. Some scoring functions naturally have bounds on their possible values, while others may be effectively unbounded, or those bounds are not known (and indeed, the problem may in part involve discovering or approximating them). Thus, it is not clear how to take account of this in any combined score. This is the bounds problem

To give an example, consider the problem of circuit synthesis: given a tabular representation of (usually) boolean inputs and outputs, and a set of usable logic gates, construct a circuit out of these gates, such that:

  • Given any input, the output of the circuit agrees with the table
  • The number of gates in the circuit (its size) is minimized; and
  • The largest number of gates between an input and an output (its depth) is minimized.

This is a suitable problem for MOSES to solve: it is essentially a genetic programming problem, but without the complexities imposed by a target language more typical of such (such as a Lisp dialect). Furthermore, this is known to be a difficult problem, as even in the simplest case, an optimal solution is known to be NP-hard, but not NP-complete, which makes a meta-heuristic like MOSES useful. However, we immediately observe two of the three problems we just discussed:

  • While agreement must take priority, it is likely that size and depth are (at least somewhat) at cross-purposes. How do we decide which to give weight to?
  • Agreement has a natural minimum (that is, 0, meaning 'all rows agree'), but while both size and depth have minima, we don't know what they are even in isolation (indeed, even finding this out can be a hard task).

Thus, before any attempt to solve this problem with MOSES, we are forced to reckon with these considerations, with very little guidance as to how we are supposed to proceed, in order to construct something as fundamental as a scoring function. Worse, even if we find a solution at all that combines these scores, we have no way of knowing if that solution will give us good outcomes.

These problems do not impact only MOSES. Even in more general EAs, where multi-objective approaches are available, they are not without cost. Managing multiple objectives adds significant complexity to any approach, and it is not clear if there is an upper limit to the number of objectives being considered before issues of efficiency, or usefulness, arise. Single-objective EAs, however, are both simpler and better-understood, and should scale up without problem. At the same time, for almost any problem where a meta-heuristic approach can be deployed, we can find many different objectives we are interested in. This means that if such upper limits exist, we are likely to find them, if not now, then in the future.

We propose to attack the problem of converting multi-objective (and multi-scored) problems into single-objective, single-scored ones, using LLMs. We see the final application of our work as part of MOSES, but will test it in a more general setting first, to ensure that our approach works and gives genuine improvements. In essence, our approach will, when given a set of objective functions, produce a single objective function, with a fixed range and scale, combining these. We plan for the LLM in question to act as the objective function in this case: when given a genotype as input, the LLM will produce a score.

In particular, our approach will have the following goals:

  • The fitness-LLM must ensure good diversity in solutions relative the individual fitness function (generality)
  • The fitness-LLM would not require retraining to solve a different instance of the same problem (transferability)

To ensure those goals are met, we will initially assess the fitness-LLM in the context of a more basic EA, specifically by comparing an approach using the fitness-LLM together with a single-objective EA against a multi-objective EA using as much of the same approach (genotype, operators, etc) as possible. As a test bed, we will use the problem of circuit synthesis, using NAND gates only. This problem represents a good test bed:

  • It is a natural fit for genetic programming, which is what MOSES is intended for;
  • It is a useful, and known difficult, problem;
  • Due to its complexity, many objectives can be considered for it; and
  • Problem instances, and their solutions, vary considerably, which forces our approach to be general.

Our objectives will be:

  • Size (number of gates used)
  • Depth (largest number of gates between any input and any output)

To ensure a reasonable set of tests, we will attempt to synthesize circuits for each of these:

  • 16-bit population count
  • 16-bit find-first-set

We chose these problems because they represent, respectively, a problem where depth is likely to be low but size to be high, and where depth is likely to be high, but size to be low. We will train one fitness-LLM on each of these problems, and then compare them against each other by using the population count fitness-LLM on the find-first-set problem and vice versa, to see how they perform against dedicated fitness-LLMs. Throughout, we will also note how diverse the final population produced by our EAs is relative the individual objectives. This will show the effectiveness of our approach, as well as its transferability, without the complexities of MOSES.

After this, we will then integrate our approach into MOSES, and attempt to solve the same problem instances for circuit synthesis in MOSES. We will compare our fitness-LLM against agreement only in MOSES, and see how the solutions compare with each other.

Team

Koz Ross, description - Project Leader

Koz Ross is a seasoned Data Engineer and Software Developer. With a background in both software engineering and academia in evolutionary computation and data structures, Koz combines familiarity with current literature on the topic of EAs with an understanding of how these can be implemented, and engineered, appropriately. He has also led several projects in a range of areas, ranging from eDSLs to libraries for code generation to a whole compiler, making him familiar with the organizational principles and practices required to organize and run this project to successful completion. He will be responsible for the overall direction of the project, as well as providing expert advice on EAs.

Marcus Fernandes, AI Engineer - Senior Project AI Engineer

Marcus is a versatile and highly skilled AI Engineer and Software Developer with a unique foundation in chemical engineering. His expertise spans scientific computing, blockchain technology, and full-stack development. He has a strong proficiency in systems modeling, and numerical algorithms. 

Banashree Sarma, AI Engineer - Project Scientist and LLM Engineer

Dr Sarma received her Doctorate in Data Science and Artificial Intelligence from the Indian Institute of Technology in Madras, India. She specializes in natural language processing and reinforcement learning. She has built context embedding models, and LLMs for a number of languages, and is currently a member of the NLP team at MLabs AI, as well as handling our reinforcement learning activities. She will be responsible for the fitness-LLMs built during the project, and to assist in the insight discovery

Mark Bedworth PhD FSS, Chief Scientist and AI Visionary - Project Consultant

Dr Bedworth has over 40 years experience at the forefront of AI. His early work included development of Boltzmann machines alongside Nobel Laureate Geoff Hinton, and the development of several novel forms of Neural Network. His work on Bayesian probability moderation mitigated the problems inherent in the veto effect, for which he received worldwide recognition in the information fusion community. He co-founded the International Society for Information Fusion, and served as its Vice President in its second year. He is currently co-founder and Chief Scientist at MLabs AI, leading the team developing radically new approaches to Deep Learning, self-directed knowledge acquisition and Artificial General Intelligence. 

Challenges

The solution outlined in this project is novel; moreover, LLMs are adept at capturing the statistical nuances of complex relationships, but can be prone to so-called hallucination.  However, the risk inherent in this new technical approach is largely mitigated by our team of senior experts. MLabs AI has some word-class AI experts on whose experience we can draw if technical blockages are encountered.

Open Source Licensing

Apache License

Links and references

Website


https://www.mlabs.city/

Additional videos



Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    5

  • Total Budget

    $80,000 USD

  • Last Updated

    6 Dec 2024

Milestone 1 - EA Component Design and Implementation

Description

We will design necessary components for an EA to solve the circuit synthesis problem. This includes: 1. A genotype representation for synthesized circuits 2. Appropriate genetic operators (mutation and crossover) for the genotype representation in (1) 3. Fitness functions measuring each objective Furthermore we will need to determine both a single-objective and multi-objective selection strategy for our future use. As part of this work we will also do some preliminary testing of the resulting approaches to see if they are suitable and adjust if they are not. As part of this work we will also define measures of genotypic diversity (dissimilarity measure between genotypes) and phenotypic diversity (dissimilarity measure between tuples of objective scores). Lastly we will implement a multi-objective EA for solving the circuit synthesis problem using the components selected above as well as generating appropriate input data for our two chosen problem instances: - 16-bit population count - 16-bit find-first-set Implementations of the dissimilarity measures described above will be part of this milestone.

Deliverables

1. Design for a genotype representation for synthesized circuits as well as genetic operators for this genotype 2. Choices for a single-objective and multi-objective selection strategy designed to combine with (1) to make two EAs one single-objective the other multi-objective 3. Implementation and preliminary testing of the multi-objective EA described by (1) and (2) 4. Data (in the form of input-output tables) describing the problem instances for circuit synthesis for 16-bit population count and 16-bit find-first set 5. Measures for genotypic dissimilarity and phenotypic dissimilarity based on the choices in (1) and implementations thereof

Budget

$15,000 USD

Success Criterion

1. Genetic operators for the design are fair (in the EA sense). 2. Single-objective and multi-objective selection strategies are based on current literature and designs. 3. Implementation is a stand-alone executable that can run on problem instances. 4. Measures for genotypic and phenotypic dissimilarity are distance metrics, or as close as possible to this.

Milestone 2 - Multi-objective Benchmarking and Data Generation

Description

We will run our multi-objective EA designed and implemented in Milestone 1 on the input data for our chosen problem instances (also in Milestone 1) to produce synthesized circuits for each of these. We will track the following information for each run: 1. The best overall solution produced for each objective 2. The number of generations required to obtain the solutions in (1) 3. The population size required to obtain the solutions in (1) 4. The genotypic diversity in the final population (how dissimilar the synthesized circuits are using the dissimilarity measures we designed and implemented in Milestone 1). 5. The phenotypic diversity in the final population (what ranges of values for the objective functions we have using the dissimilarity measures we designed and implemented in Milestone 1) 6. Overall running time and memory us 7. PRNG seed(s) used. Furthermore we will collect every generated genotype (both initial population and modified) from each run along with their scores for each objective function. These will be collected together as suitable data sets for LLM training with one such data set per problem instance we are interested in.

Deliverables

1. At least one run using the multi-objective EA designed in Milestone 1 for each of the circuit synthesis problem instances together with the necessary tracked information as per the description above 2. Two data sets one per problem instance containing every generated solution from all runs along with their scores per objective suitable for training an LLM. One data set will be made per problem instance

Budget

$15,000 USD

Success Criterion

1. The data sets for each problem instance show good diversity (that is, the dissimilarity measures show a large range). 2. The tracked information exists for a large number of runs, with consistently good, but diverse, outcomes.

Milestone 3 - Fitness LLMs

Description

Using the data sets created in Milestone 2 we will train LLMs designed to act as single-objective scoring functions combining the scoring functions used in the multi-objective EA designed and implemented in Milestone 1. These LLMs will not be generative; instead we will modify an existing open source LLM (like LLaMa or similar) so that the output is a single value in the range 0 to 1. We will train one fitness-LLM for each of the problem instances (and data sets) using whatever architecture fine-tuning methods and hyperparameters produce the best results. We will also implement some means of using these trained fitness-LLMs as scoring functions in the single-objective EA designed and implemented in Milestone 1.

Deliverables

1. A trained LLM designed to act a single-objective scoring function based on the data sets generated in Milestone 2. One LLM will be trained per data set. 2. Implementation of the LLMs from 1 as scoring functions in the single-objective EA designed and implemented in Milestone 1.

Budget

$25,000 USD

Success Criterion

Each fitness LLM can be used as an objective function from the Milestone 1 executable.

Milestone 4 - Single-objective EA Benchmarking and Comparison

Description

Using the fitness-LLMs trained and adapted for use as scoring functions in Milestone 3 we will run the single-objective EA designed and implemented in Milestone 1. We will run the 16-bit popcount problem instance using the fitness-LLM trained on the 16-bit popcount data from Milestone 2 and analogously for the 16-bit find-first-set. We will use the recorded parameters (population count and generation count) from Milestone 2 and compare the results between the Milestone 2 runs on the same problem instances with the runs using our combined approach. We will note the same information per run as in Milestone 2. Subsequently we will run the single-objective EA designed and implemented in Milestone 1 again on each of the two problem instances but using the fitness-LLM designed for the other problem: thus we will try to solve 16-bit popcount with the fitness-LLM for find-first-set and vice versa. As before we will use the same recorded parameters from Milestone 2 and compare each result with its corresponding one using the correct fitness-LLM. Lastly we will compare the outcomes of the multi-objective EA and our fitness-LLM-using single-objective EA to see whether we can match or improve the performance of the multi-objective approach.

Deliverables

1. For each run of the multi-objective EA from Milestone 2 for the 16-bit popcount and 16-bit find-first-set problem instances a matching run of the single-objective EA from Milestone 1 using the corresponding fitness-LLM from Milestone 3 2. Records of the same information from 1 as for the Milestone 2 runs. 3. For each run from 1 a matching run of the single-objective EA from Milestone 1 using the popcount fitness-LLM for find-first-set and vice versa. 4. A comparison of all outcomes from 1 to 3 with their Milestone 2 counterparts.

Budget

$15,000 USD

Success Criterion

1. For at least one of the problems, the single-objective EA with a fitness LLM out-performs the same problem with the multi-objective EA. 2. For at least one problem, the single-objective EA is not sensitive to fitness LLM (that is, using the fitness LLM for the other problem doesn’t impact the quality of its results).

Milestone 5 - MOSES Integration and Benchmarking

Description

Using the work from Milestones 1 through 4 we will integrate the fitness-LLM solution into MOSES and attempt to solve both instances of the circuit synthesis problem used throughout. We will compare the outcomes in MOSES relative the use of only single objectives (agreement with size as a tiebreaker then agreement with depth as a tiebreaker) against each of our fitness-LLMs using the same criteria as in Milestone 2.

Deliverables

1. An implementation of best overall fitness-LLM from Milestone 4 into MOSES as a scoring function. 2. At least one run of MOSES using the single objective of agreement on the circuit synthesis problem instances using size and depth as tiebreakers (six combinations in all). 3. For each run from 2 use the combined implementation from 1 keeping other parameters the same to do a corresponding run on the same problem instance. 4. A comparison of the results from 2 and 3 using the Milestone 2 criteria.

Budget

$10,000 USD

Success Criterion

The fitness-LLM-using version of MOSES out-performs MOSES without the fitness LLM on at least one problem instance.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

4.0

  • Feasibility 4.7
  • Desirabilty 3.3
  • Usefulness 4.7
  • Expert Review 1

    Overall

    4.0

    • Compliance with RFP requirements 4.0
    • Solution details and team expertise 3.0
    • Value for money 4.0
    It's an innovative use of LLMs within MOSES, though highly specialized to a particular technical problem

    Of note is this doesn't really leverage the knowledge in large foundation models, rather it leverages transformers trained on data from candidate solutions to a specific optimization problem. it's still quite interesting though, as an exploration of transformers as an alternative to say Bayes nets or decision trees as a way of doing the EDA modeling in MOSES...

  • Expert Review 2

    Overall

    4.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 4.0
    • Value for money 5.0
    Solid proposal with a solid team behind it. Somewhat narrowed to multi-objective optimization.

    Although the proposal is narrowed down to handing multi-objective optimization, assisted by LLMs in order to build the multi-objective to single-objective aggregator, it is solid and the team behind it looks solid too. I wish the team knew that MOSES actually handles multi-objective optimization to some extend, but it's probably not documented well enough for them to be aware of it. Anyway, as they mention, they do not need MOSES to be ready in order to start the work as they can begin their research on other evolutionary methods, I view that as a plus. They have a concrete use case (circuit synthesis) which I think is a plus as well.

  • Expert Review 3

    Overall

    4.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 3.0
    • Value for money 5.0

    Intriguing and solid approach to multi-objective optimization with a strong team.

feedback_icon