Tofara Moyo
Project OwnerWriting code , implementing training, gathering results and producing reports
We will have a critic output scalar values that will be ordered in stages as rewards for the agent. the agents reward will not be the global reward R, but be r. where r=R*s+R. The critic will be trained solely on R. This equation is designed to bring about a two step hierarchy of needs in the agent since s and R are correlated...to get maximal values of r the agent must perform actions that optimize s in such a way that it ultimately optimizes R otherwise if s is large and R is small that's a smaller value than if R is large and s is small...but it is optimal is for both to be large. Our initial experiments show that this process works. A two stepped network would involve r=R*(s1*s2+s1)+R
Develop a modular and extensible framework for integrating various motivational systems into AGI architectures, supporting both human-like and alien digital intelligences. This could be done as a highly detailed and precise specification, or as a relatively simple software prototype with suggestions for generalization and extension.
In order to protect this proposal from being copied, all details are hidden until the end of the submission period. Please come back later to see all details.
we would have created a more complex version of the critic which outputs a graph of values rather than a vector of scalars
Code that is debugged and ready for training
$10,000 USD
tested code on small dataset
we would have tested the algorithm in many scenarios to see how it performs benchmarking it against the simpler version
table of data showing results of experiments
$10,000 USD
published results
fine tuned the algorithms hyperparameters and implemented extensive testing and produced reports
reports showing performance of our algorithm
$10,000 USD
reports
Reviews & Ratings
Please create account or login to write a review and rate.
© 2025 Deep Funding
Join the Discussion (0)
Please create account or login to post comments.