Retrospective articles on the development of AI/AGI describe it as a series of failed attempts to find the right approach to the problem. In reality, all tested techniques correspond to one or another approach used by the human intellect. And "full size" AGI requires using all these approaches as part of a system. Optimization, search algorithms, using a system of rules and inference, associative memory based on neural networks, and statistical data analysis are not alternatives but components of future versions of AGI systems.
Does this mean that the only problem on the way to a full-fledged AGI is the reasonable integration of already developed and tested approaches? Alas, no - because two "components" of intelligence have not yet been developed to the required extent. Not surprisingly, these are optional components to build narrow AI systems.
The first of these components is the ability of the system, during operation, to autonomously detect unknown objects and processes in the environment, process the flow of data from sensors, and also use this to create new concepts. A dynamic description of the observed part of the surrounding world (scene) is constantly being formed, including both already-known objects that the AGI system can identify and unknown ones that can replenish the set of known ones by creating a new concept.
Narrow AI does not have such capabilities - all concepts and their connection with data from sensors are formed with one or another participation of people and relate to concepts already available to people involved in the corresponding process of knowledge transfer.
One of the problems in implementing this component is that the dominant approach to analyzing the current scene/situation uses, as a primary operation, the formation of a set of features and/or segments from sensor data reflecting the state of the observed scene at a certain point in time. The consequence is that it is possible to track the dynamics of the situation only by analyzing changes in the parameters of those features that can be found on a single data set; it is necessary to add features that initially reflect the dynamics of the observed situation.
In addition, the traditionally used set of features needs to be wider. For example, a person easily detects such "cloud"/"ghost" objects that have an irregular shape without clear contours with smoothly changing colors, which is impossible to implement using a traditional set of features.
However, two fundamental problems are that firstly, segments and features reflect the specifics of a data set from sensors (a two-dimensional array of pixels, for example) and not logical objects, and secondly, the fact that natural objects are continuous in space and time - in contrast to the brightness/color field, represented by a set of pixels/voxels. The result is the inability to detect objects other than by comparing them with those known from the "training" data set, that is, the inability to catch objects outside the training data set.
Solving the problem requires a paradigm shift: It is necessary to use some principle of dividing the visible scene into objects instead of assessing the similarity of a scene fragment to objects from the training set. A similar paradigm change is already taking place in the analysis of cause-and-effect relationships, where to obtain correct results, instead of primitive correlation, the principle of forming hypotheses and testing them is used. At the "philosophical" level, this comes down to the principle of the presence of an invariant, for which the moment of invariance violation is an event reflecting a cause-and-effect relationship.
A similar principle allows us to divide the visible scene into objects, for each of which there is a particular invariant different from the invariants of other things in the scene. At first glance, this looks like a variant of traditional segmentation. Still, in reality, it is radically different. In "old good" segmentation, the parameters of segments are calculated or specified, but in this case, it is necessary to search/construct a formula for a specific invariant that distinguishes an object from the environment. This radically increases the computational complexity of scene analysis (as in the case of finding the correct hypothesis in the case of cause-and-effect study), which, however, is partly compensated by the continuity of the physical world: the search for objects in the visible scene does not need to be carried out for each data frame if potential objects movement is tracked.
The second missing "off the shelf" component is the autonomous formation of intentions based on the mission (purpose/"profession") of the AGI system and the current situation. Intentions, some useful/desired results, are the basis for choosing the subsequent actions.
This component is not required in narrow AI systems since the relevant information comes from outside. For example, a car navigation system receives data from the user about where to go, whether it is necessary to be there as quickly as possible or as economically as possible, etc.
Using directly voluminous data characterizing the current situation to form intentions is complex and irrational; a reasonable way is to use integral assessments of several vital aspects - the degree of danger, the pace of change in the situation, the level of available resources, the presence/absence of damage, etc. Such integral assessments directly analogize the human/animal's feelings. They are in their simplest form in many technical systems equipped with diagnostic and/or automatic control systems.