In the chapter before last, CAUSALITY DISCOVERING. HOW STATISTICS CAN HELP, a task was proposed to search for causal relationships using a set of observational data with four factors.
An alternative approach is based on the search for a Boolean function describing causal relationships and is described earlier in the chapter CAUSE AND EFFECT RELATIONSHIPS II.
The result of applying this approach for both of our data sets is the same functional relationship between the observed variables V1, V2, V3, and V4:
V2 = V1 and ( V3 or V4 )
It is easy to guess that this corresponds to an electrical circuit in which V1, V3, and V4 are switch states, and V2 is a lamp state:
Differences in the data sets - and accordingly in the correlation coefficients - are caused by the different frequencies of the values of those factors that play the role of causes, that is, the different frequencies of the on-off positions of the three switches. The difference in the observation statistics has nothing to do with the essence of the causal relationship.
This example demonstrates several significant circumstances:
Statistical methods usually use some a priori assumptions about the role of the experimental factors. The non-statistical approach requires no information other than the observation results.
Statistical data can be significantly different for the same system under study if the observation conditions differ.
The non-statistical approach allows us to correctly divide the observed values into consequences and factors-causes (and also to weed out factors that do not affect the result in any way, if such are presented).
Non-statistical methods for searching for causal relationships require minimal observational data. They have an explicit criterion for the sufficiency of the amount of data: as long as there is not enough data, more than one function can describe causal relationships, which makes all these functions hypotheses. As the volume of observations increases, some hypotheses turn out wrong until one desired dependence remains.
The sufficiency of a relatively small number of observations is combined with high computational complexity since non-statistical methods are combinatorial. However, the accumulation of observational data, in most cases, is a more expensive operation than computer calculations. In real-time artificial intelligence systems, the sufficiency of a few observations means an increase in the ability to timely identify cause-and-effect relationships, including the causes of possible undesirable (dangerous) situations.
In cases where the factors that are causes are not covered by observation for one reason or another, the opportunity to find a causal relationship by constructing an appropriate function is lost. Outwardly, this looks like a non-deterministic dependence of the result on those factors that did fall into the set of observables. This creates the false impression that probabilistic statistical methods can correctly describe such a causal relationship. Moreover, in some cases, the corresponding results are helpful. However, as our example shows, the statistical dependencies, in this case, are correct only as long as the unobservable unaccounted factors retain a constant value (or a constant probability distribution). And since they are unobservable, there is no way to guarantee such constancy. Moreover, the observation mode may play the role of an unaccounted factor. This is illustrated by the following example, which looks anecdotal:
The statistics of the presence or absence of light in a room with automatic switching on of light in the presence of people radically depend on where the observer is located, in or out of the room.
Implementing non-statistical approaches to discovering causal relationships is challenging in terms of programming; if necessary, we can assist in adapting such methods to the particular system.