Multi-objective EAs for LLM multiparameter tuning

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Expert Rating 4.0
Luke Mahoney (MLabs)
Project Owner

Multi-objective EAs for LLM multiparameter tuning

Expert Rating

4.0

Overview

Tuning hyperparameters when building an LLM, or any DNN, is a difficult task: they are inter-dependent both with each other, the data and the desired learning outcomes, and even determining acceptable values is a difficult process with little to no guidance. Thus, the process is tedious, repetitive and not likely to succeed. To solve this problem, we will design, implement and test an EA-based solution to the hyperparameter tuning problem, designed to be extensible to suit various choices of hyperparameter(s) and measurable objectives. To demonstrate its efficacy, we will use NanoGPT as a test bed, to tune context window size and vocabulary size, measuring efficacy by comparing loss.

RFP Guidelines

Evolutionary algorithms for training transformers and other DNNs

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $40,000 USD
  • Proposals 8
  • Awarded Projects 1
author-img
SingularityNET
Aug. 12, 2024

Explore and demonstrate the use of evolutionary methods (EMs) for training various DNNs including transformer networks. Such exploration could include using EMs to determine model node weights, and/or using EMs to evolve DNN/LLM architectures. Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is an example of one very promising evolutionary method among others.

Proposal Description

Company Name (if applicable)

MLabs LTD

Project details

When building an LLM (or indeed, any DNN), one important task is the tuning of hyperparameters: settings that define and configure the learning process and architecture of the network itself. These range from choices about how data is presented (for example, context window size) to decisions around the structure of the network itself (for example, vocabulary size). These parameters are inter-dependent, both with each other and with the data, and the desired learning outcomes, and determining their optimal (or even satisfactory) values is a difficult process, with little to guide the data scientist. Often, these must be tuned by trial and error, a process which is tedious and repetitive, especially when the number of hyperparameters under consideration is large, as is often the case with transformer-based architectures more generally, and LLMs specifically.

The difficulty of this problem stems from the complex relationship between hyperparameters, which can often have unpredictable results when any of them are adjusted. Furthermore, while certain natural bounds exist (for example, vocabulary size is ultimately limited by the size of the token dictionary of the input data), these can be uselessly large, giving little idea of where to begin or explore. While we can certainly assess what outcome any given choice of hyperparameter(s) will have, we have no easy way of knowing which to tune and in which way. To make matters even worse, it is not always the case that we want to straightforwardly minimize or maximize any given hyperparameter: continuing the example of the vocabulary size, there likely exists an 'ideal' or 'good enough' size, which is not necessarily the largest possible (as it would include many tokens that might not be relevant or useful) nor the smallest possible (as this would discard too many relevant or useful tokens). This makes it challenging to approach this problem in a systematic way.

In situations similar to ours, evolutionary algorithms (or EAs) are an often-used metaheuristic. In particular, we want a multi-objective EA, not only because we want the ability to tune as many (or as few) hyperparameters as we want, we may assess the outcome of a particular hyperparameter, or set of hyperparameters, differently to others. Together with suitable mutation and crossover operations, this would allow both the automation of the search for better hyperparameter combinations and settings, but also a more directed search than a purely trial-and-error one. Lastly, the generic nature of multi-objective EAs, and their ability to balance any number of objectives, would make them suitable regardless of what kind of tuning is desired, giving data scientists flexibility in their use.

At the same time, however, not just any multi-objective EA will do for this problem. In particular, any successful approach must address the following two problems:

  1. As every measure of the fitness function requires building, and training, a neural network, we must not do this unless absolutely necessary (the rebuilding problem);
  2. As the cost of the fitness function is thus high, we must make use of parallel resources as part of the EAs execution as far as possible (the parallelism problem).

Many, if not most, 'off-the-shelf' EAs, multi-objective or not, fall afoul of one, or both, of these problems: the general assumption is that fitness function invocation is cheap, and most EA designs function like simulations, creating a large critical path that resists parallelism. Thus, a careful, problem-specific design is needed here, which avoids both of these problems, but also retains enough flexibility and power to be usable for tuning any number of hyperparameters, while balancing a range of fitness measures, to suit whichever application any given neural network should be trained for.

We will design, implement, and test an EA-based solution to the hyperparameter tuning problem, extensible to suit various choices of hyperparameter(s) and measurable objectives relating to these. Our solution will avoid the rebuilding and parallelism problems, and will be 'blind' to exactly what is being used to train the neural network whose hyperparameters are being tuned. To demonstrate our solution's efficacy, we will use NanoGPT as a test bed, with the combined works of Shakespeare as input. As a benchmark, we will use two hyperparameters:

  • Context window size; and
  • Vocabulary size.

We will measure the efficacy of the tuning by comparing loss, which we aim to minimize. We will also use these measurements to improve the performance of our solution, particularly focusing on better utilization of parallel resources, fewer rebuilds of the neural network architecture, improving the quality of the outcome with fewer generations, and lastly, any overall runtime or memory use improvements. We will demonstrate the efficacy of these improvements bycomparing against the measurements mentioned previously.

Team

Koz Ross, Data Engineer - Project Leader

Koz Ross is a seasoned Data Engineer and Software Developer. With a background in both software engineering and academia in evolutionary computation and data structures, Koz combines familiarity with current literature on the topic of EAs with an understanding of how these can be implemented, and engineered, appropriately. He has also led several projects in a range of areas, ranging from eDSLs to libraries for code generation to a whole compiler, making him familiar with the organizational principles and practices required to organize and run this project to successful completion. He will be responsible for the overall direction of the project, as well as providing expert advice on EAs.

Gergely Szabo, Data Engineer - Project Scientist

Gergely Szabo is a seasoned Data Engineer and Software Developer. He is an experienced team leader and avid contributor to open-source libraries. With a robust background in software engineering, Gergely blends his passion for data with practical expertise in creating scalable, efficient, and maintainable solutions. He is a versatile team player and an adept communicator, making him a valuable asset to the team. He will assist Koz in all aspects of data engineering.

Jared Pon, Senior Software Engineer - Project Software Architect

Jared Pon is a skilled Software Developer with deep expertise in compiler development including tools for translating algebraic data types across languages and designing algorithms for type inference, lambda lifting, and instruction set generation. He is proficient in multiple software languages and has a strong command of backend development, database design, and performance optimization. He comes to the team with a rigorous academic background in Computer Science and Mathematics, Jared combines a passion for innovation with exceptional problem-solving skills, making him a versatile and impactful member of the team.

Banashree Sarma, AI Engineer - Project LLM Engineer

Dr Sarma received her Doctorate in Data Science and Artificial Intelligence from the Indian Institute of Technology in Madras, India. She specializes in natural language processing and reinforcement learning. She has built context embedding models, and LLMs for a number of languages, and is currently a member of the NLP team at MLabs AI, as well as handling our reinforcement learning activities. She will be responsible for the coordinating the LLMs built during the project.

References

https://github.com/karpathy/nanoGPT

 

Open Source Licensing

Apache License

Links and references

Cost: $12.5K

References

https://github.com/karpathy/nanoGPT



Website

https://www.mlabs.city/

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    4

  • Total Budget

    $40,000 USD

  • Last Updated

    5 Dec 2024

Milestone 1 - Design

Description

We will investigate literature on EAs to produce implementable designs (or choices) for the following: - A multi-objective evolutionary selection strategy - A suitable genotype representation for the hyperparameters we aim to tune - Suitable evolutionary operations (mutation and crossover) - An approach to avoid the rebuilding problem - An approach to avoid the paralellism problem Then we will create a combined implementable design for an EA using the above designs or choices.

Deliverables

1. A design or choice for a genotype representation for context window size and vocabulary size. 2. A design or choice for evolutionary operations (mutation and crossover) for the genotype representation in 1. 3. A design or choice for a multi-objective evolutionary selection operation. 4. A strategy for avoiding the rebuilding problem. 5. A strategy or approach for parallelising the EA. 6. An overall design combining 1-5 in a single approach.

Budget

$8,800 USD

Success Criterion

1. A combined design for an EA, with appropriate genotype, evolutionary operations, multi-objective selection and strategies for handling parallelization and rebuilding. 2. All parts of the design from 1 are specified in a document, with an emphasis on implementation considerations. 3. The evolutionary operations are fair (in the EA sense). 4. All design choices are grounded in existing EA work, particularly the multi-objective optimization function.

Milestone 2 - Development

Description

Using the design in Milestone 1 we will implement an executable program based on that design. Said executable will metaheuristically search for choices for context window size and vocabulary size using NanoGPT and the combined works of Shakespeare data set. As part of this work we will design some IPC-like interface for allowing our executable to call NanoGPT with specific parameters then return loss information to allow our approach to work. Our implementation must allow easy modification of the population size and the number of generations to run to serve future benchmarking.

Deliverables

1. A working implementation of the Milestone 1 design. 2. A wrapper or testing rig for NanoGPT that allows making requests from outside the program to train a model given our choice of hyperparameters and returning loss information. 3. The implementation from 1 must have easily modifiable population size and generation counts such as via a CLI option.

Budget

$14,300 USD

Success Criterion

1. The Milestone 1 design is implemented successfully as an executable, and can be run on problem instances. 2. The testing rig for NanoGPT responds correctly to a structured request from outside its own process, returning the loss information in a structured form. 3. The implementation from 1 can be configured (such as via CLI or config file) to modify population size and generations without requiring a rebuild or code changes.

Milestone 3 - Testing and Benchmarking

Description

We will use the executable and modified NanoGPT from Milestone 2 to test and benchmark our solution. In particular we will verify the following: - How well we optimize loss relative the default NanoGPT parameters on the same data set; - How long each run of our executable takes given the same number of generations; - How much improvement we gain by adding more generations to our EA; - How much improvement we gain by increasing the population size of our EA; - How much parallel resource utilization we gain on a typical run; and - How many times we require a rebuild and how many of these (if any) are redundant.

Deliverables

1. Comparison of how well the Milestone 2 executable's tuned hyperparameters optimize loss relative NanoGPT defaults. 2. A benchmark of the execution time of the Milestone 2 executable given a fixed population size and generation count. 3. Degree of improvement in hyperparameter tuning at different numbers of generations for the Milestone 2 approach. 4. Degree of improvement in hyperparameter tuning with different population sizes for the Milestone 2 approach. 5. Measurement of utilization of parallel resources when running the Milestone 2 executable. 6. Measurement of neural network rebuilds when running the Milestone 2 executable as well as how many of these are technically redundant.

Budget

$4,400 USD

Success Criterion

1. Milestone 2 executable out-performs NanoGPT defaults 2. Parallel saturation of at least 4 cores is obtained. 3. Rebuilds are sublinear, or at least linear with an expected factor of less than 1.

Milestone 4 - Optimization

Description

Using the benchmarks and measurements gathered in Milestone 3 we will attempt to improve the performance of the Milestone 2 executable. We will prioritize improvements in the following order: 1. Better utilization of available parallel resources. 2. Fewer rebuilds of the neural network architecture with particular emphasis on eliminating redundant rebuilds. 3. Improving the quality of the outcome with fewer generations. 4. Improving the overall runtime or memory use more generally. We will then implement an optimized version of the Milestone 2 executable and compare it against the same benchmarks as used to measure the Milestone 2 executable in Milestone 3.

Deliverables

1. An optimized version of the Milestone 2 executable on the basis of the Milestone 3 measurements and benchmarks given the priorities specified above. 2. Comparisons between the Milestone 2 executable and its optimized version from 1 using the Milestone 3 benchmarks. These must show improvements in the optimized version relative the Milestone 3 baselines.

Budget

$12,500 USD

Success Criterion

The optimized executable out-performs its Milestone 2 equivalent on at least one of the criteria specified above.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

4.0

  • Feasibility 4.0
  • Desirabilty 4.0
  • Usefulness 4.8
  • Expert Review 1

    Overall

    5.0

    • Compliance with RFP requirements 4.0
    • Solution details and team expertise 5.0
    • Value for money 5.0
    Strong and well-scoped

    While lacking integration into Hyperon, the result should be a valuable free-standing experiment.The choice of nanoGPT as a target between toy problems and no-way-to-run-locally-scale is commendable. It's not including CMA-ES, despite it being applicable to their experiment.

  • Expert Review 2

    Overall

    4.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 5.0
    • Value for money 4.0
    Strong proposal

    Strong, technically robust plan to solve the hyperparameter tuning challenge using multi-objective EAs. It aligns with the RFP’s goals and has potential for broader application in LLM and DNN development. However, it could use a clearer connection to AGI frameworks like Hyperon.

  • Expert Review 3

    Overall

    3.0

    • Compliance with RFP requirements 2.0
    • Solution details and team expertise 4.0
    • Value for money 5.0
    They are solving a somewhat different problem than requested, though an interesting one

    They are proposing to solve hyperparameter tuning using evolutionary algos, rather than to actually do the NN learning using evolutionary algos. This is a decent idea but different than what the RFP asks for , and in itself is easy to test (the bulk of the work here is setting up the framework not dealing with the evolutionary parameter tuning itself..)

  • Expert Review 4

    Overall

    4.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 4.0
    • Value for money 5.0

    Solid detailed proposal covering the RFP and displaying a good understanding of evolutionary algorithms and hyper parameter tuning. Provides a good task breakdown and has a strong team. Could be more ambitious.

feedback_icon