Causa prima - part II

Understanding the IC* algorithm

Feb 01, 2024

Back in November –time flies!- I have promised to go through the IC* algorithm as presented by J. Pearl in his book Causality, while keeping in mind that to us this algorithm is really a set of operational definitions. That is, it doesn’t let us discover causal relations (as the name of the field of causal discovery would have us believe, implying that their existence and nature is a given), rather it forms the basis for defining a bunch of different types of causal relations. How these definitions mesh with our preconceived notions of causality is for us to work out in detail later on. Here I will follow Sect. 2.6, pag. 52 of Causality [1]. You are warmly invited to read the book for yourself even if you find the following convincing: water from the source tends to be cleaner.

The IC* algorithm takes in input a set of variables or, more formally, a sampled joint distribution. This means you acquired data about M instances -which could be people, cars, planets, instants in time- by measuring N variables -smoking habits, number of sisters, activity from Radon gas in their basement- and you think of those M points in N-dimensional space as a sample from the joint distribution of these N variables. Another way to put this is that all you have to work with are patterns of statistical association between variables, as measured based on a finite sample. The output is a “marked pattern” that is a partially directed acyclic graph that contains four types of edges. Here you have to think that each variable is a node in the graph and the presence of an edge between two variables indicates a relation of some sort.

The types of relation are as follows (in Sprites 1993 notation which in my opinion is better than Pearl’s own):

A -> B: A definitely causes B, also called genuine cause later in the book

A o-> B either A causes B or some other latent -that is, unobserved- factor causes both, also called potential cause later in the book

A <-> B some other latent factor causes both A and B, also called spurious association later in the book

A o-o B either A causes B or B causes A or some other latent factor causes both

I obviously omitted A <- B and A <-o B which can be obtained simply by switching places between A and B. In this notation the presence of an empty circle at the end of an edge suggests that an arrow may or may not be there: A o-> B means that either A -> B or A <-> B. Also note that A <-> B does not mean here that A co-causes B in a dynamic loop or something of the sort: it is a shorthand for A <- L -> B where L is a latent variable, that is a variable that we did not include in our analysis [2].

The algorithm works as follows. First, for each pair of variables A and B, it looks for a set [3] of other variables S_AB such that conditioning [4] on S_AB makes A and B independent. Of course S_AB can be the empty set, meaning that A and B are unconditionally independent. If no set S_AB is found, A and B are linked with a (for now) undirected link, i.e. A o-o B. The rationale behind this step is that if A and B are always associated no matter what we control for –that is, no matter the context we are considering- then they should have a link of some kind, either causal or spurious. The edge marked as o-o is the most general kind of link.

Second, for each pair of non-adjacent variables, that is variables that are not joined by a direct link after carrying out the previous step, that also have at least a common neighbour C (that is A o-o C o-o B), check if C is in S_AB, that is check if A and B can be made conditionally independent by conditioning on C. Note that since A and B are not neighbours (A o-o B does not hold) then there exists a set S_AB and we can check whether C is in that set or not. If C is in S_AB we move on to the next common neighbour or to the next pair of variables, otherwise we change the edges from the original A o-o C o-o B to A o-> C <-o B. What is the intuition for this? If C is a common neighbour of A and B it means that it is related to them in every context: we found a persistent correlation. Now if C is in S_AB this means that controlling for C (possibly together with additional variables) makes A and B independent: this suggests that C either mediates a causal effect from A to B or from B to A, or that C is a common cause of A and B. If we rule this out (because C is not in S_AB) we are left with two scenarios: either latent variables are responsible for a spurious association between A and C and between B and C, or C is a common effect of A and B. This is exactly what A o-> C <-o B means.

Third, turn any triple A o-> C o-o B or A o-> C o-> B into A o-> C -> B. As I understand it [5], this is because the previous step did not find that A o-> C <-o B, so C is neither spuriously associated with B (i.e. C <-> B) nor does B cause C (C <- B). The only option left then is that C causes B, hence C -> B. This is continued recursively so that any triple C -> B o-o D produced by the application of this rule is also turned into C -> B -> D.

Fourth, if A o-o B or A <-o B and there is a directed path of causal links through other variables such that A -> S -> I -> N -> O -> … -> B then turn the edge between A and B to A o-> B or A <-> B respectively. This is to make sure the graph is acyclic.

That’s it. In the final graph any two variables linked by an edge have the causal relation (or group of possible causal relations) represented by the edge type they have been assigned.

[1] 2000 edition or 2009 edition? I don’t really know, but the terminus post quem from the bibliography looks like 1999. Did the 2009 edition expand the bibliography?

[2] Clearly if A and B both evolve with time and we introduce A₀, B₀, A₁, B₁, …, A_t, B_t then it may be that A₁ <- A₀ -> B₁ and A₁ <- B₀ -> B₁, with the previous states playing the role of the latent confound. This is the case if A and B are the position and momentum of a classical particle moving in a potential (just write Hamilton’s equations). The notation A <-> B is however more general than that.

[3] Just any set? Or a minimal set? This is not specified in the original text. Needs more digging.

[4] Conditioning on S_AB means that we control for the variables in S_AB. This in practice means adding them to a regression as covariates, matching on them, etc.

[5] Just like ChatGPT, I can make mistakes. Consider checking important information.

Mario’s Substack

Discussion about this post