In the previous chapter on object tracking AGI: OBJECT` TRACKING, the principle of object tracking was described, which allows for dealing with previously unknown objects, which fundamentally distinguishes this approach from the widely used method based on identifying objects in each new position using neural networks. Such a neural network approach not only eliminates the tracking of previously unknown objects but also requires a massive amount of training data to identify objects in different orientations and objects partially hidden behind other objects. Tracking, based on the continuality of the movement of objects in the real world, does not require prior training and can cope with other problems of the neural network approach based on object identification. The price to pay for this is the greater complexity of the algorithms that implement the process, which does not necessarily mean more computational resources are required.
First of all, the tracking of complex objects in the described approach is reduced to the tracking of elementary ("atomic") objects, which, as far as possible, are combined (assembled) into more complex ones based on the technique described in the chapter AGI: STRUCTURING THE OBSERVABLE. These elementary objects are expectedly divided into zero-dimensional "point" objects, one-dimensional contours, and two-dimensional "spots".
Contours are the most informative for natural vision (see chapter ARTIFICIAL GENERAL VISION). However, not all visible objects have crisp contours, but they have a specific distribution of attributes (brightness, color, etc.) within a two-dimensional area, which is used in "spot" tracking.
In turn, objects with a tiny angular size do not allow us to talk about the contour or the distribution of brightness/color. They are characterized only by position, that is, by coordinates.
The principle of tracking point (zero-dimensional) objects is apparent. On two consecutive frames, the positions of the same thing are close to each other and determine the object's displacement. Naturally, tracking of objects of each type has its own specifics.
Tracking of continual, that is, one-dimensional (contours) and two-dimensional ("spots") objects, is much more complex since, for them, movement means not only translation but also rotation and scaling. In addition, being continual in nature, they are, in practice, represented by discrete data. The contour is represented by a sequence of points on this contour, and the "spot" is represented by a set of points in a two-dimensional area, for which the values of attributes (brightness, color, etc.) are known, extracted from the data set of the current frame.
The desired translation, rotation, and scaling form the transformation parameters. The transformation applied to each of these points on the next frame gives the new position of the point; in this new position, the attributes of all points must be the same as in the previous frame. Formally, this defines a system of equations where the transformation parameters are unknown. In practice, the matter comes down to the problem of minimizing the integral error for a set of points since the exact equality of the corresponding attributes of points on two consecutive frames is unrealizable due to the presence of noise in the data, sampling of all types of data, and because the actual values of the attributes from frame to the frame also vary to some extent. In general, the process is similar to that used to stabilize images in videos; the difference is that the process does not involve the entire frame but only its fragment corresponding to the tracked "spot".
The essential aspect is how the "spot" area is initially formed and how the "spot" changes during tracking. In addition to changing the attribute values (color/brightness), the traced object can be gradually covered by other objects or go beyond the frame, or vice versa, appear from another object or the edge of the frame, thereby increasing its area. The description of the corresponding algorithms is beyond the scope of this chapter.
For one-dimensional elementary objects, i.e., contours, the situation is similar, but instead of the requirement that attribute values match, the requirement that the contour points on one frame hit the line of the same contour on the other is used. The difficulty, in this case, is since the sequences of points that form the contour line are determined by the contour detection algorithm independently on each frame, so there is no correspondence of points on the same contour for two adjacent frames (and the number of points on the contour from frame to frame can differ).
In any case, the use of motion continuity for tracking implies the need for a sufficiently high frame rate so that objects change position, orientation, and size little from frame to frame, and the computing power is sufficient to perform the described operations in a time equal to the frame arrival interval. At the same time, tracking of each of their objects can be performed independently of the others. Accordingly, the process as a whole is well parallelized, and the maximum number of traceable things grows with the growth of available computing resources. From the description of the approach, it is also apparent that it does not require much RAM or disk memory.
The above concerns the situation where traceable objects are assumed to rotate only around the viewing direction; rotation around the axes in the drawing plane is interpreted as a smooth change in the traced object. With sufficient computing resources, the number of transformation parameters can be expanded by including rotations around the axes in the frame plane, bringing the total number of parameters to six. Finally, it is possible to use a transformation that considers the distance to the object; the projection of a three-dimensional scene onto a two-dimensional frame becomes perspective.
Those interested in the algorithms' details can contact us for detailed information and assistance.