As a rule, AGI architecture schemes include components responsible for searching for patterns, that is, for finding certain repeating logical objects/situations among information received from outside. In many cases, finding patterns is the first element in the chain of operations that allows an AI system to realize the ability to learn autonomously. Note that the essential thing is that finding patterns is not an end in itself but a supposed tool for detecting unknown logical or natural objects as a function of autonomous learning of the real world.
Pattern searching is implicitly based on this inference: if we are looking for patterns, and a pattern is something that repeats, then searching for repeating pieces of data will give us exactly what we want to find. As we have mentioned many times (see OBVIOUS CHOICE), what is implied is the main reason things don't go as expected.
When implementing a search for patterns using data from the natural environment that has yet to be preprocessed by humans, we immediately encounter a problem: patterns are found, but they turn out to be completely different from the patterns that we need and that we expected.
For example, when trying to find patterns representing objects unknown to the system in a stream of visual data (sequence of video frames), the pattern of the background landscape is seen first - it is repeated in a series of successive frames without changes, but moving objects that represent individual objects - and they are the ones that interest us if we are talking about objects unknown in advance - have a changeable appearance and are repeated less often. In the "second echelon", an enormous number of small-sized patterns are found, representing corner points, spots, and fragments of contours with different orientations - again, not objects at all that were expected to be seen as specific patterns corresponding to them. No less unexpected situations occur when analyzing non-visual information, which will be illustrated below.
It turns out that the problem of finding the required patterns is not finding some patterns but discarding the majority that we do not need and are essentially information garbage/noise. It is clear that such separation does not require a repeatability criterion but some other criterion. Successful attempts to construct such alternative criteria immediately reveal that if we have such a criterion, no other criterion - including the original "obvious" criterion of repeatability - is required.
Patterns repeat, but finding useful patterns using repeatability as a single search principle is unrealistic when it comes to raw data from the natural environment. Useful human-made data preprocessing is an element of implicit using an alternative criterion to ensure that we find what we are looking for.
As a consequence of the above, it is clear why the search for patterns works well if textual information is analyzed: it was prepared by a human (or programmatically using algorithms compiled by people), and because of this, there is no data from their natural environment that should be thrown away.
Let's illustrate this with a primitive text model. Three text strings, two of which are meaningful, and the third is a random sequence of letters, are "mixed" into one sequence of characters. Finding useful information in the resulting sequence is extremely difficult - but if the characters of the three source lines have different colors and we do not ignore this information but, on the contrary, use it as an alternative criterion, the task becomes much simpler:
Using algorithms for searching for patterns in a sufficiently long sequence of characters, ignoring color will certainly yield a particular set of patterns—but they will not carry any useful meaning (if we are not talking about data packaging); using the same algorithm for sequences of characters of the same color will yield quite useful dictionaries of lexemes for both non-random subsequences.
The described model task demonstrates the specifics of the real task of searching for patterns in a sequence of events. The absence of the information represented in the model problem by the color of the symbols turns the problem into a combinatorial one, the computational complexity of which grows catastrophically with the increase in the number of types of events.
SUMMATION:
To search for patterns in data from a natural environment, it is essential to find those features of the data that allow you to find what you need, using not repeatability but some alternative principle that takes into account the specifics of the environment and the set of sensors used. Solving this problem of finding the usable principle requires intellectual effort (or millions of years of natural selection, in the case of principles implemented in animal brains). It is reasonable to begin the search for a helpful search criterion by formulating a principle for assessing the degree of usefulness of specific patterns.