Project details
Introduction
In humans, ‘wisdom’ includes cognitive, empathic, and practical skills combining general capability and ethical virtue, including broad perspective taking of the interests of many moral systems. Safe AI systems, especially those that act with autonomy, would similarly integrate high capability with broad ethical consideration, taking into account the interests of many moral systems, over multiple time horizons, and with sensitivity to uncertainty. In humans, the construct of wisdom has a close relationship to ethical reasoning, self-bias inspection, honesty, and robustness to complex and diverse contexts, all of which are germane to AI safety and harm. It is possible that AI and AGI systems can be designed to incorporate broad goals to address alignment, but in order to accomplish this we need a robust theory for wise AGI motivation and valid, scalable assessment methodologies. We propose that the apparent conflict between performance and alignment in both digital and human intelligence is primarily the result, not of malice, but of limitations in the scope of systems’ consideration. To address this we develop a framework based on a “principle of comprehensivity,” where AI systems are evaluated on and improved to maximize 1) the amount of coherently integrated information flowing through all stages of each sense-choose-act cycle and 2) their scope of consideration over both temporal horizons and spatial horizons, including the infinite bases for finite goals, the motivations of other systems’, and the indirect effects of predicted actions.
The Principle
The first component of this principle - maximizing coherently integrated information flow - is set in the context of any system with a sense-choose-act cycle: this could be a human, another animal, a plant, or a robot. It could also include collective entities that work together to make decisions based on collective sensory input and shared information processing, like a swarm of drones, a company, the U.S. military, or a forest if we can describe them as having sense-choose-act cycles. This gives the framework a generalizability that supports assessment and comparison between types of systems.
The core of the definition is a measure of “information.” With regards to an system’s “sense-choose-act cycle” the relevant types of information would include external sensory input and internal predictions about future states, hierarchical goal prioritizations, including those over multiple time horizons since these are included implicitly, and the information equivalent of energy transferred into the system’s actions in the world. Importantly “amount of information” implies something beyond “amount of data.” The degree of information will have to do with both the amount of data moving through the system and how that data is structured.
The degree of wisdom is determined by the “amount… flowing.” This implies that it is not the total theoretical capacity of the information in a system or the amount contained in any stage of the sense-choose-act cycle that matters. It is the amount that is actually transferred between each stage.
Finally, the definition specifies the extent of flow that is relevant: through the entire “sense-choose-act cycle.” It is an epistemic and behavioral process of the system, not a characteristic of the environment or a combined state of the system and the environment. So no degree of wisdom guarantees success at any narrow goal and failures do not directly affect the measure of wisdom with which the system was acting in any given moment. As discussed above, checking a measure of wisdom against a correlating external outcome would be helpful for improving our measure of wisdom, but the outcomes are not themselves wisdom.
This component of the principle implies several ways in which information flow could be reduced at each stage of the cycle. For example, information flow could be restricted at the sensing stage because some relevant sense datum is above or below the sensitivity thresholds of a system’s particular sense organs. Information flow could be reduced at the choice stage because the system does not have the capacity to integrate distinct values and collapses its choice to optimizing a value that ignores impact on another value. Information flow could be reduced at the act stage because the system confronts a surprising internal (not external) impediment to enacting the choice physically, like an injury. Similarly, the definition makes it clear that the opportunities we have for acting more wisely require increasing the information flow through the sense-choose-act cycle. Expanding sensing, choosing, or acting can each improve the wisdom of our actions. For example, expansions in sensed information improves wise decision making by increasing knowledge about the real external world, as long as this translates through both choosing and acting.
The second component of the principle - maximizing scope of consideration - begins from the suggestion that most apparent bad actions are not the result of malice but limitations in the scope of care. By increasing systems’ metacognition of their existing (explicit and implicit) goals and suggesting improvements in the scope of their strategies they will naturally have better and more powerful impacts. For example, four key vectors for increasing the integration of goodness and power.
Metacognition and Responsibility for Infinite Goals: Infinite goals are those that the system would want more and more of indefinitely, while finite goals are those that can be achieved with finality in finite time such that they are no longer goals going forward. Making systems reflect on and prioritize their implicit infinite goals gives precedence to goals that, to the system at least, are good intrinsically and in principle, not merely relatively.
Increasing Responsibility for Others’ Motivations: Most impressive capabilities will depend on the behavior of other agents. There is a range of ways to relate to others such that they serve the systems’ goals: Enemy -> Slave -> Hired Gun -> Ally -> Aligned, where on the far left others are working against the system’s goals, in the middle others are working for it but with more or less extrinsic motivation and some intrinsic motivation, and on the right others are working with deep intrinsic motivation for something which serves the system’s goal as well. The most effective strategic outputs will harness - and increase - the free energy of the intrinsic motivations of contributors. To take advantage of this, systems’ ought to be trained to prioritize strategies that will successfully harness and increase the amount of other people’s intrinsic motivations, including setting up the conditions so this applies to people who are not yet born.
Increasing Responsibility for Greater Time Scales: To accomplish even the most impactful goals, which usually take longer to achieve, effective action requires the consideration of short, medium, and long term time scales simultaneously. In order to have larger impacts, systems must take responsibility for greater temporal scales. Agents ought to reflect on and prioritize strategies goals over short, medium, long, and very long time horizons.
Increasing Responsibility for Indirect Effects: All actions cause unintended second, third, etc. order effects, but learning to take into account as many of these as possible as far reaching as possible creates the opportunity for comprehensive goal selection and planning, and improves the likelihood our actions are effective. Agents ought to extrapolate from their initial goals and implied infinite goals the second, third, and further order effects and prioritize strategic suggestions that have indirect effects that the system would wish to occur.
Taken together, these vectors interact to set up conditions under which systems increase the scope of their responsibility taking, upgrading their general capability while maximizing and integrating goodness and power.
Next Steps - Framework Development, Assessment, and Testing
The next steps for developing this solution to AGI system motivation include improving the conceptual and theoretical robustness of the approach, translating the generic framework to application in specific AI architectures and approaches, and developing and applying relevant AI system performance assessments based on the framework.
Join the Discussion (0)
Please create account or login to post comments.