Those who are interested in artificial intelligence are probably familiar with what is called the concept grounding problem. It is about how to connect formal concepts presented in one form or another in the knowledge set of an AI system with real-world objects, relationships between these objects, events, actions, etc. The purpose of grounding is to create a "bridge" between the flow of quantitative (numerical) data from sensors and the AI system's concepts. In the case of natural human intelligence, this "bridge" looks simple, natural, and evident in the example of teaching children: someone older points to an object and names it. Thus a new concept with his name is formed, associated with the thing of the real world. However, a closer analysis leads to the conclusion that everything is far from being so straightforward.
First, children learn the simplest operations with objects before they learn to talk.
Secondly, animals, like humans, also have a specific mental set of concepts and use them to solve various kinds of problems (for example, extracting food with sticks, wires, pebbles, etc.), despite the lack of the ability to convey knowledge in a language.
And finally, the most "deadly" argument: in the original scheme, an adult who already has a connection between a mental concept and an object of the material world forms the same association in a child - but where did this connection come from an adult? From his parents? What about them? Obviously, this scheme is devoid of a starting point at which the relationship between the object and the concept is formed to later gain the possibility of transferring this connection from one subject to another.
Formation of the connection between an environmental object and a mental concept requires the ability to create such a connection autonomously.
Such a connection can be transferred to another subject in the learning process, or it can forever remain the "personal asset" of this subject.
In turn, this means that the formation of a mental concept grounded in something is only possible if there is the ability to do this autonomously.
So subjects and AI systems that claim to be intelligent must be able to divide the observed part of the environment into a set of objects. At its core, this is the task of constructing a structured description of the current situation, which was the subject of the chapter AGI: STRUCTURING THE OBSERVABLE. HOW TO DETECT UNKNOWN THINGS AND EXPLAIN IT
Accordingly, the process when an adult shows a child a specific object and names it is not a process of forming an association between an environment object and a mental concept, but the formation of an agreement about which symbol (sound, word, pictogram, gesture, etc.) will be in further used by both parties to refer a concept that already exists in both subjects and is formed by each of them autonomously.
The process of detecting/distinguishing environmental objects is based on natural laws. It can be innate and immutable or have an inherent primary algorithm/mechanism that allows enhancement through learning.
Of course, new concepts can be formed by operating with existing concepts; the relevant operations include generalization, analogy, induction, deduction, abduction, and so on, but at this stage, there is no need to associate the concept with a specific object of the environment.
As mentioned earlier, the algorithm for distinguishing objects in the observed part of the environment has two main features. Firstly, it is combinatorial in essence, and secondly, it is based on the physical and geometric laws of the environment.
Combinatoriality means a specific limit of the complexity of objects that a subject (natural or artificial intelligence) can detect using available computing tools in an acceptable time. This is one of the reasons for the difference in the intellectual abilities of humans and animals (and different species among animals). The same factor explains the increased complexity of human analysis of the visible scene with the increased number of simultaneously observed objects.
Any "true" AI system must provide autonomous learning capability. In turn, autonomous learning is only possible if there is an ability to divide the visible situation into components-objects, including those that have never been encountered before and whose identification the system is not trained to. Thus, the ability to detect unknown objects, isolating them from the other observed part of the environment, is an essential element of AGI and any system that claims the adjective "intelligent".
An important aspect is that the internal descriptions of the same object formed by different subjects differ to some extent. Accordingly, the same symbol (word, gesture, hieroglyph, etc.) can be interpreted somewhat differently by various humans/systems, which is what we observe in reality. Communication between subjects can be used to "synchronize" internal descriptions.
I think this is actually the starting point for any theory or practice of true artificial intelligence. Any system or algorithm (set of algorithms) for object detection and the related concept formation must take elemental sensory data that is essentially time and location data for real objects as input and build up a hierarchy of 'objects' by composition. I'm focusing on entropy as the fundamental discriminator for all data (signals). Detectable persistence in time and place together with stability of relationships between data points is my approach. I started this as a technique for natural language understanding, but realized it is fundamental and general to all sensory signals.
It should be mentioned that there is the same topic in Math Logic discipline Model theory [1, 2]. Where we have a) structures b) theories with concepts, relations, constants - primary and defined. And concept grounding may be treated as calculation of concept definition on particular structure.
So the main question may be What kind of structures AGI creates inside and what kind of manipulation (knowledge processing) it is doing?
It seems the main feature of the mind is processing of colored 3D figures, particulaly in motion.
[1] https://en.wikipedia.org/wiki/Model_theory
[2] https://plato.stanford.edu/entries/model-theory/