Heirachical reward functions

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Expert Rating 3.0
Tofara Moyo
Project Owner

Heirachical reward functions

Expert Rating

3.0

Overview

We will have a critic output scalar values that will be ordered in stages as rewards for the agent. the agents reward will not be the global reward R, but be r. where r=R*s+R. The critic will be trained solely on R. This equation is designed to bring about a two step hierarchy of needs in the agent since s and R are correlated...to get maximal values of r the agent must perform actions that optimize s in such a way that it ultimately optimizes R otherwise if s is large and R is small that's a smaller value than if R is large and s is small...but it is optimal is for both to be large. Our initial experiments show that this process works. A two stepped network would involve r=R*(s1*s2+s1)+R

RFP Guidelines

Develop a framework for AGI motivation systems

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $40,000 USD
  • Proposals 12
  • Awarded Projects 2
author-img
SingularityNET
Aug. 13, 2024

Develop a modular and extensible framework for integrating various motivational systems into AGI architectures, supporting both human-like and alien digital intelligences. This could be done as a highly detailed and precise specification, or as a relatively simple software prototype with suggestions for generalization and extension.

Proposal Description

Project details

https://arxiv.org/abs/2412.00044

This paper shows our current results for the concept. we plan to further make it more aligned to human values by inputting a text description of the tasks that the agent is performing during a simulation, perhaps gaming instructions with the aid of a transformer conditioning the input.

 

eventually we will freeze the critic and use it just as it is except with a text description describing all the core values and modules we would like the agent to posses input into the critic. this should induce the reward function to reward such behaviour and condition the agent to perform in ways that are predictable if required or ways that match and allign with human values.

Open Source Licensing

GNU GPL - GNU General Public License

Links and references

https://arxiv.org/abs/2412.00044

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    3

  • Total Budget

    $30,000 USD

  • Last Updated

    5 Dec 2024

Milestone 1 - Created graph version of heirachy

Description

we would have created a more complex version of the critic which outputs a graph of values rather than a vector of scalars

Deliverables

Code that is debugged and ready for training

Budget

$10,000 USD

Success Criterion

tested code on small dataset

Milestone 2 - Tested algorithm

Description

we would have tested the algorithm in many scenarios to see how it performs benchmarking it against the simpler version

Deliverables

table of data showing results of experiments

Budget

$10,000 USD

Success Criterion

published results

Milestone 3 - Fine tuned

Description

fine tuned the algorithms hyperparameters and implemented extensive testing and produced reports

Deliverables

reports showing performance of our algorithm

Budget

$10,000 USD

Success Criterion

reports

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

3.0

  • Feasibility 2.5
  • Desirabilty 3.3
  • Usefulness 3.8
  • Expert Review 1

    Overall

    1.0

    • Compliance with RFP requirements 1.0
    • Solution details and team expertise 1.0
    • Value for money 1.0
    Simplistic reward function approach

    Hierarchical reward approaches are possible but there are way more sophisticated instances of that in the literature than the particular simplistic example proposed. Furthermore no details are provided of how to extend this to a motiviation system for Hyperon/PRIMUS, hence this project misses the RFP topic.

  • Expert Review 2

    Overall

    3.0

    • Compliance with RFP requirements 3.0
    • Solution details and team expertise 4.0
    • Value for money 4.0
    This is a very interesting proposal that doesn't quite do what the RFP requests, but maybe we can work with the proposer in some other way ?

    I like the idea of experimenting with hierarchical reward functions and it could be interesting to look at in the context of our Mind Children robotics project for example, once we have a simulation world for it up and running. However, I don't think hierarchical reward functions is an adequate way to think about the overall motivation framework of an AGI system, which is the mandate here.... The author's other papers on arxiv show him to be a fascinating out-there creative thinker with some real technical facility.

  • Expert Review 3

    Overall

    4.0

    • Compliance with RFP requirements 3.0
    • Solution details and team expertise 4.0
    • Value for money 5.0

    An interesting approach to creating hierarchical reward structures via actor-critic models with good initial results. I would have liked to have seen more thought and detail going into the milestones and the proposal does not cover the RFP completely.

  • Expert Review 4

    Overall

    4.0

    • Compliance with RFP requirements 3.0
    • Solution details and team expertise 0.0
    • Value for money 5.0
    Essential component for motivational frameworks

    The proposal aims to enhance an embryonic, but fundamental approach to inducing hierarchical reward structures in artificial agents based on Markov decision-making and tested on the pendulum with excellent results. The approach has the potential to become an essential component of motivational frameworks given that hierarchical reward structures are the "engine" of motivation, a critical "brick" in the structure on which motivational frameworks for various forms of intelligence can be built. While not addressing the full spectrum of the Call, it is of higher value than other, more "thin air" but comprehensive proposals, given its clarity and potential for seamless implementation. 

feedback_icon