The development of AI has long been based on a paradigm illustrated by the diagram below: the input of a system that implements artificial intelligence receives some information, which is intelligently processed and forms the output; the process uses a set of knowledge that can be replenished along the way
.For several decades, small and well-structured data (for example, a description of the position and/or movement of pieces on a chessboard) served as input data, the interpretation of which was based on the initially laid down rules/knowledge. Interpretation of the input data did not have the character of a separate task requiring significant computing resources and significant efforts in software implementation. Almost all computational resources and the efforts of programmers have been directed to the inherently complex analysis of a relatively small amount of well-structured data.
The situation changed dramatically when it came to developing real-time AI systems focused on working in the natural environment - control systems for robots, car autopilots, etc. Data streams from video cameras and microphones are many orders of magnitude larger than textual data in chess or Go problems and are not textual but numeric. The interpretation of sensory data has become a separate non-trivial task of converting a gigantic stream of numbers into a reasonably compact structured description of the situation, affordable for logical analysis with good reactivity (an answer is needed before the situation changes and decisions cease to be adequate to it).
Even though AI systems were initially based on the analysis of human mental activity, this part of this process remained outside the attention of both developers and neuroscientists. When the need for implementation arose, it turned out that this process stage was akin to an invisible underwater part of the iceberg: its presence was not apparent to many developers, and neuroscientists could not offer practical implementation approaches. This is not surprising since this stage in a person is realized subconsciously, inaccessible for introspection, and extremely difficult for experimental research.
Essential features of this data processing stage are two aspects: firstly, it depends on the specifics of the environment and, accordingly, the set of sensors. Secondly, it does not require the mutability of the data processing algorithms - in other words, there is no need to provide self-learning.
The first aspect is important from the point of view of the theory of AGI. This component of the AGI system inevitably turns out to be dependent on the environment in which the AGI-controlled system must function and the required capabilities.
The second aspect means that "classic" programming approaches based on hard-coded algorithms can be used.
As a result, the data flow diagram in the AGI system takes the following form (the intensity of the flows is indicated for purely illustrative purposes and is expressed in bytes per second):
The meaning of the "underwater part" of the system is reduced to the conversion of a gigantic stream of numerical data into a stream of structured information describing a situation model that, on the one hand, is affordable for analysis at the stage of intelligent analysis, and, on the other hand, is complete enough to provide the required functionality of the AGI system in general.
Thus, the division of labor in the system consists in the fact that one component "distills" the data, taking into account the specifics of the environment and the purpose of the system, and the second provides an intellectual analysis of the "distilled" data and the accumulation of knowledge.
One of the factors contributing to the complexity of developing an AGI system is that the implementation of the second component requires the presence of the first, and its development is very non-trivial.
Re: a "separate non-trivial task of converting a gigantic stream of numbers into a reasonably compact structured description", this is called subitizing. It is an operation that can be performed by imposing a known or discovered structural pattern on an aggregation of objects. It can be implemented in computers at a level that far exceeds human capability.