OPTIMIZATION AS DECISION-MAKING BASIS
Comments and questions triggered by the decision-making chapter YIN AND YANG OF DECISION MAKING suggest that many readers' perceptions of decision-making are the same as action plan making. In reality, these are different concepts.
The traditional "old good AI" approach is that the AI system loops through a two-step process: make a plan to achieve a goal and then execute it. The plan usually implies the existence of a criterion for achieving the goal. Without such criterion, the plan must be formed outside the AGI system and transferred to it for execution.
If the criterion for achieving the goal is used to form a plan, then the plan's formation is itself an action, and the system must decide to create a plan. If the plan is developed externally, the system must decide on its realization. In any case, a decision is required before the start of the plan, i.e., at some point, the decision is made when the plan has not begun to be implemented or does not exist at all and is subject to formation.
However, the presence of a criterion for achieving a goal does not mean that it is possible to draw up a step-by-step plan to achieve this goal. If the goal is to win a game of chess or to catch a mouse (fly, rocket, and so on), this is quite obvious. But even in non-game situations (when there are no those who deliberately oppose the achievement of the goal), external factors that do not depend on the system require a permanent analysis of the situation and a decision on whether to continue the plan or interrupt it.
This aspect is also essential: AGI uses an internal system of motivation, which means that intentions can change at any time due to a change in the current situation (seeing the restaurant, being hungry, we cancel the plan to go to the gym). Changing intentions cancels the current plan (if any) and requires the formation of a new plan.
Thus, making a decision is a choice of the action that should be performed at a given moment. An action can be, in particular, drawing up an action plan.
In a natural environment, each potential action can have several variants of the resulting situation. In game-like cases, the possible consequences are based on potential responses. A set of possible situations after performing an action is formed in a natural environment as a combination of possible outcomes of action and predicted changes in the environment that do not depend on the system's activity.
Accordingly, a realistic plan does not represent a strictly predetermined sequence of actions but a tree-like algorithm of actions - after performing an action, the result is evaluated, and the following action is selected depending on it. It is obvious (especially in game situations) that the horizon of such planning (the maximum number of steps in a sequence of actions) in the case of such a tree-like plan is limited by a small number of steps due to limited memory, computational resources and a sufficient time for drawing up a plan (drawing up a plan should be carried out in such a time that the situation will not have time to change to a significant degree).
A rational way to implement such planning in a natural operating environment is "rolling planning":
based on the available experience, a tree-like forecast is formed (DECISION-MAKING FUNDAMENTALS), starting from the current state, with a horizon of N steps (each tree path is a sequence of "situation - action - situation - action ...")
the optimization problem of choosing the optimal chain of actions is solved, as described below
the first step of the N-steps plan is being executed
the new actual situation is evaluated: if the new situation corresponds to one of those expected under the plan and the motivation has not changed, then the rest of the plan with a horizon of N-1 steps is extended to N* > N-1 steps (depending on the available resources); if the result is outside the expected range or the intention has changed, the current plan is canceled, and a new one is drawn up "from the zero".
That is, after each step, the situation is evaluated, and the plan is either extended or altered entirely.
The planning horizon is not, in general, a predefined constant. In particular, the planning horizon may depend on how quickly it is required to act (up to the degeneration of the plan in one step in the event of an emergency) and on how predictable the situation is (the more possible variants for the outcome of actions, the smaller the planning horizon because of the limit of computational resources).
Frequently performed sequences of actions the AGI system uses to form composite actions, which are a fixed sequence of actions. Composite actions, in turn, can be elements of other composite actions. Accordingly, despite the small planning horizon, measured by the number of composite actions, it can correspond to an arbitrarily large planning horizon, measured by the number of elementary actions. Therefore, the plan can consist of an unlimited number of elementary steps.
The execution of the composite step of the plan, of course, can be interrupted at any of the elementary steps if the actual situation goes beyond what was expected.
An essential aspect is that an explicitly formulated goal (and the corresponding criteria for achieving it) is not a mandatory element of planning for the approach described above. In the classical approach to planning, a specific task/goal is first formulated. Then, based on the goal description, conditions are formed that make it possible to determine whether the goal has been achieved or not. In the case of the rolling planning described above, it is sufficient to have a current criterion for situation evaluation formed by the motivation module, which can be built without explicitly formulating a goal. It is enough to have a current criterion for comparing different situations created by the motivation module (ARCHITECTURE) without explicitly express any specific goal. In the case of an explicitly formulated goal, it must be converted (interpreted) into a set of conditions for achieving the goal, which plays the role of constraints in the optimization problem while maximizing the criterion formed by the motivation module. In this case, we obtain the optimization problem of achieving this goal in the best possible way (with a minimum cost, minimum time, and so on).
In the case of a game situation, a minimax optimization problem is solved under the principles of game theory.
Decision-making is about what action should be taken at the moment.
An action plan is a tree-like algorithm for choosing the following action depending on the outcome of the previous one.
The situation is evaluated, and the plan is extended or reformed after each step.
The planning horizon, measured by the length of the plan's chain of actions, cannot be big due to the limited resources required for planning. Still, the planning horizon, measured by the length of the sequence of elementary actions, is potentially unlimited due to forming and using composite actions.
Planning comes down to a combination of forecasting and solving the optimization problem.
Planning does not always require an explicit formulation of a task/goal; planning becomes a constrained optimization problem if the goal is formulated explicitly.