Suppose we want to design a way to detect unknown objects in an observable environment. In that case, it makes sense to first formulate our expectations based on everyday human experience in engineering language. The success of the development radically depends on how adequate the translation is from ordinary language, based on the common principles of the functioning of vision and the brain, into engineering language, operating in formal concepts.
Translating the phrase "appearance of an unknown object" into engineering language requires formalizing all three terms of this phrase.
"Appearance"
Appearance does not mean that the object did not exist before that moment - it may not have existed, or it could have existed but be hidden behind other objects, or it could simply be too far away and, therefore, beyond the possibility of discovery. Equally important is that an object can remain unnoticed for a particular moment, being in the field of view until attention is paid to it. There are usually many objects of different sizes in the field of view; any analysis of the observed set requires some time. As a result, the appearance of an object in engineering language means that attention was paid to the object. An object may cease to be visible before attention is paid to it; most small objects fall into this category, forming a background that is not separated into a set of objects.
It is also essential that "appearance" means the dynamism of the observed situation, and this, in turn, means that the main principle of attention control is the detection of changes in the observed picture.
"Unknown"
"Unknown" means the object has been noticed, but it does not look similar to any already known objects. The reason, however, is that it is visible only partially, is unusually oriented, is specifically illuminated, etc. In further observation, it may be discovered to be similar to something known. That is, unknown means "not yet identified." Identification is a process that takes time, so even when an object can be identified, any object is "unknown" for some time. Unidentified objects are candidates for generating a new concept and, accordingly, inclusion in the set of known ones. Formally, this is replacing the category of an object "unknown" with belonging to a new type of object (type = concept). Note that a "type change" is also possible when an object is identified - when the type is clarified or an identification error is corrected.
The most important aspect is that the ability to detect unknown objects is not only a necessary condition for the ability to self-learn as an antithesis of knowledge transfer from one information owner to another (knowledge transfer) but also provides the ability to immediately respond to the appearance of an unknown object, without which in many situations adequate behavior is impossible.
"Object"
How a human describes the scene he sees as a set of objects located in a certain way relative to each other seems evident until "childish questions" arise. For example, it is clear to everyone that in the landscapes below, the observer will see dunes and clouds:
"Childish questions" regarding dunes and clouds: Where does one dune end and another begin? How many individual clouds are visible?
When asked to tell what is shown in this picture:
We may get the answer "a set of gears, levers, etc." or "a mechanical watch."
In engineering language, this means the following:
[A] It is required to use a principle (or set of principles) that allows one to distinguish between observed objects (which is not the same thing as objects on some single static image!).
[B] If a specific combination of objects, considering their relative position, corresponds to a particular concept, then this composite object can describe the observed instead of listing its components.
Accordingly, we have "atomic" objects, and there are composite objects that are a structure of their components (which ones can be atomic or combined, which forms a hierarchical structure ). And if so, then:
[C] It is necessary to use a principle (or a set of principles) that allows several objects to be considered components of a composite object.
Why are we talking about principles and not about rules for various types of objects? Because the absence of universal principles immediately leads to the loss of the ability to detect hitherto unknown objects, that is, to the preservation of the perception-concept gap.
Both of the mentioned principles - the principle of delineating objects and the principle of combining several objects into a composite super-object - are determined by the properties of the actual physical world when it comes to an AGI-controlled system in our natural world, that is, robots, autopilots, security systems, Mars rovers, etc. In the virtual world (chess, Go, etc.), these principles will be radically different.
This circumstance is the essential aspect: since the specifics of the environment dictate the principles, it means that all proper AGI systems can and should use the same principles as people (or are compatible with them). The importance of this aspect becomes clear if we remember how children learn to name objects: the teacher shows an object and names it; the result corresponds to the expected one if (and only if, speaking the language of mathematics) both the child and the teacher distinguish this object from the environment in the same way. The sameness of selection is ensured by the fact that the same principles are used, corresponding to the physical environment, developed in evolution. For the development of AGI, it is essential that people use the innate principles of identifying objects in the observed environment: this means the sufficiency of hard-coded algorithms for detecting unknown objects.
[D] Detection of arbitrary unknown objects can be implemented using hard-coded algorithms.
Most material objects occupy a specific region of space that either does not change its shape or changes it smoothly and slowly. The boundaries of an object can be both sharp or fuzzy; coloring, surface texture, and illumination can vary and do not change the object's essence (but can affect the course of the detection process).
Sensory visual information is two-dimensional, while material objects are three-dimensional and located in three-dimensional space. The consequence is that the relative position of the projections of 3D objects in a 2D frame changes both when the objects themselves move and when the camera moves. Observation of changes in the relative position of 2D projections in the frame is a source of information that allows you to select individual objects from the observed scene (as well as estimate the distance to them). This is precisely what will enable us to formulate the general principle of detecting objects in the observed environment, regardless of whether the type of object is known to the observer or not. Here, the critical aspect is represented by the word "observation" - the general principle of detecting objects requires analyzing the dynamics of changes in two-dimensional sensory data:
[E] It is required to observe (as the antipode of the term "see"), that is, to analyze the process of change in two-dimensional sensory data.
More details are in the next chapter.