The specifics of AI/AGI systems and the problems of computational parallelization determine two main goals: simplifying development and increasing the efficiency of using available computing resources.
Let's start the discussion with an analysis of the everyday analogy. Imagine a reader who, having come to the library for a specific book, finds that it is currently issued to another reader. This is analogous to the situation when a thread in our system tries to access shared data, and it turns out this is not possible now. What are the reader's options? Here are the analogs of what is implemented by traditional synchronization tools using mutexes and locks:
sit on a bench and wait for a message from the librarian that the book you need is available;
stand next to the librarian and ask every five minutes if the required book has already been returned.
When it comes to the reader in a library, this sounds anecdotal, doesn't it? But it is these two options that are implemented in various implementations of the standard mutex. Either the thread is put in an inactive state, and the processor switches to work with other threads ("context switch"), or with some very short time interval, the thread requests data access again and again ("spinlock"). Both options lead to unproductive use of computing resources.
Let us return, however, to the reader in the library. Obviously, real people will not sit idle, waiting for the opportunity to get a book, and will find something to do with time to their advantage. Precisely the same approach is possible in the case of multithread programming. This requires that thread has a set of possible tasks to perform - as is the case with people: if there is no opportunity to do one thing, we do another. As a result, the rational organization of calculations looks as shown in the diagram:
It is possible to increase the efficiency of using available computing resources if, instead of the relatively inefficient synchronization options implemented by the standard set of tools (std::mutex and its derivatives), we use an approach without using spinlock and context switching. This is possible by taking into account the specifics of the AGI/AI system - the presence of many parallel logical processes: instead of passively waiting for access to protected data, we can switch the thread to the execution of the process that currently allows it.
To implement the principle "if this work cannot be done now, do another" allows the use of the low-level operation "compare and exchange" (https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange) as a universal synchronization mechanism, which also underlies the standard mutex and its derivatives.
Templated wrappers for data and logical processes save the programmer from the explicit use of resource access synchronization operations; the pool of permanently active threads is represented by a "ready-to-use" class. Referred wrappers also allow us to collect statistical information about processes and/or monitor the course of events in a running system without modifying the code that implements the essence of logical processes; standard tools do not provide such an opportunity.
Templated wrappers, which turn any class/structure into shared data with the possibility of simultaneous access to data without modifying them and exclusive access for modification, implement the necessary synchronization capabilities in full (upon receiving a request for access to modify and subsequent read requests are rejected; a `write` request only allowed after all active read requests have ended). As mentioned in the first part, the C++ standard does not stipulate such requirements, which is the reason for the different behavior of the same code on different platforms; the use of the described parallelization means it guarantees the same behavior on all platforms since it is based on a well defined "compare and exchange" operation. Among other things, modifying shared data in read-only mode is excluded since the implementation allows the compiler to detect an attempt to do this. The conventional use of the tools of the described approach also excludes the occurrence of a deadlock.
To demonstrate the different behavior of the code that uses standard synchronization tools on various platforms, a demo application has been developed that is guaranteed to create a deadlock after a random period of time after the start:
https://github.com/mrabchevskiy/deadlock
In the code, access to shared data is protected by the standard shared_lock/unique_lock; all source code fits on one page, but this does not make the task of detecting a possible deadlock and finding the cause of an actual deadlock a trivial task. Naturally, in large-scale real AI/AGI systems, this is even more difficult to do. The mentioned demo application demonstrates different behavior on different platforms: in the case of GCC/Linux, a deadlock is detected, a message about it is issued, and execution stops; in the case of VisualStudio/Windows, deadlock exposes itself only by eliminating printing to the console, and the program continues to "run," without really doing anything until the user forces it to stop.
In general, the described approach is based on the representation of the AI/AGI system as a combination of a set of logical processes, a set of shared data objects, and several permanently functioning threads, as described in the chapter
HIDDEN PART OF THE INTELLIGENCE "ICEBERG".
The logical process has its own local data. Variants of the logical process include both relatively fast actions that are repeated in their entirety with each call and a more complex sequence of steps, of which exactly one of the steps is performed with each call. In the latter case, it is natural to implement the logical process as a state machine (or as a coroutine after adopting appropriate future versions of the C++ standard). The wrapper provides execution protection when the process is already active (called by one of the active threads) and uses the same technique as the shared data wrapper.
If the corresponding step requires access to shared data, an attempt is made to obtain the appropriate access; if this is not possible, the execution is simply aborted by "return" (without switching to the next step (next state), if there are more than one).
Implementation details and the test application's description are the next chapter's subject.
Thus, the specifics of the described approach are:
The essential logic of processes in the code is separated from the code, ensuring correct parallelization
Built-in ability to collect/monitor statistics of synchronization processes is provided
Conventional use of tools eliminates deadlocks
Reduced the number of threads and the frequency of context switches, increasing the efficiency of the use of available resources
The number of error-prone aspects of synchronization encoding is reduced.