It’s been about a month. The main reason for this hiatus is the upcoming Machine Learning for Physical Sciences NeurIPS workshop, to be held in New Orleans on Dec. 15. Oh, and the raccoons. They dug a hole in our roof and my landlord did not manage to get them removed yet. They are so cute but they are also a rabies hazard. Hopefully they will be safely relocated to some nearby forest this week.
But back to the workshop. I got three accepted papers, two as first author. The first one of these is called Causa prima: cosmology meets causal discovery for the
first time. One of the referees, who nonetheless recommended acceptance, had to say that the title is not really useful: it is not about cosmology but galaxy evolution. So here I fixed it for you (though not in the actual paper).
Causality research in machine learning -of which causal discovery is a sub-branch- is a pretty hot field at the moment, especially if you consider the recent explosion of causal representation learning work. Simply put, the goal of causal discovery is to learn a causal structure (what causes what) on a set of variables in a data driven way. If you are familiar with representing causal relations between variables with directed acyclic graphs (DAGs), well that’s what we want to learn, ideally: a DAG describing the causal relations between our variables.
If you are not familiar with DAGs, here’s one:
(coffee) -> (stained teeth) <- (smoking) -> (respiratory disease)
The little arrow -> means that drinking coffee causes stained teeth and the other little arrow <- means that smoking also causes stained teeth; moreover smoking causes respiratory disease too. Arrows point from cause to effect. This causal structure sounds reasonable to us -even before looking at a dataset about the correlates of smoking habits- because we can leverage quite a few implicit assumptions about reality. The real challenge is to learn a diagram that looks like this from a dataset without relying on prior knowledge[1].
One family of approaches to this problem relies on testing pairs of variables for statistical independence. For the sake of the argument let us assume that smokers are as likely as anyone else to drink coffee, that is if I happened to learn that you smoke I would gain no additional information of whether you drink coffee or not. Thus coffee and smoking are independent: they have no statistical association. We can take this lack of association as an indication of lack of causal relation: neither coffee causes smoking nor does smoking cause coffee. So no arrow between them. Note that this is an assumption: in general two causes could cancel out perfectly, so that their common effect displays no association to either of them, even though this is unlikely.
Things get more interesting when we control for stained teeth, for instance testing for independence between smoking and coffee on people who have stained teeth and, separately, on people who do not. Now if smoking increases the probability of having stained teeth and coffee also does, we would expect smoking and coffee to have a negative association among those who have stained teeth as there are virtually no people who don’t drink coffee and don’t smoke but have stained teeth[2]: Berkson’s paradox. Upon observing this we may conclude that (coffee) -> (stained teeth) <- (smoking).
By now you probably you get the idea of the spirit of independence-based causal discovery. Here is a great textbook if you want to read more about it. I do not want to go much further in explaining how this approach to causal discovery works because, to be honest, I have not really understood causality to my satisfaction yet, even though I posted a lot about my journey into causal land.
There are mainly two ways of not understanding something: one where you believe you understood, but you have not (misunderstanding) and one where you have the clear nagging feeling that you didn’t get it: the pieces of the puzzle did not fully fall in place[3]. But why didn’t they? Typically there is something else stuck there that prevents the pieces from aligning correctly. As a seven year old I checked out a book on internal combustion engines from my school’s library. As you may expect, at the time I knew absolutely nothing about mechanics, and I lacked the notion of inertia. But I was far from an empty vessel: by that age most kids have a fully developed intuition of how mechanics works, an intuition formed in the sublunar world, where friction is ubiquitous. A world where stuff does not move if it is not being actively pushed. It turns out that it is impossible to understand the intake, compression and exhaust phases of the four stroke engine cycle if you believe that the piston must be pushed at all times in order to move. The fact that it keeps moving and it can do useful work even when it is not actively pushed by expanding gases –as in the combustion phase- just didn’t compute, so I gave up. Seven year old me did not have to face the added difficulty of not knowing whether the contents of the book on internal combustion engines were true or not: I believed back then that it was impossible for a printed book to contain falsehood. But with causality –and anything that comes up in research really- I no longer have this luxury.
Luckily, in-depth understanding is not a prerequisite for research. To quote Galileo: Io stimo più il trovar un vero, benché di cosa leggiera, che 'l disputar lungamente delle massime questioni senza conseguir verità nissuna (I esteem the discovery of truth in even the slightest matters to be of more value than the prolonged dispute of grand issues without achieving any truth). Where you draw the line is ultimately a judgment call.
So let’s march onwards. In the causa prima paper we took a state-of-the-art dataset –thanks Ben - on supermassive black holes and their host galaxies and fed it to two causal discovery methods, the Peter-Clark (PC) algorithm and the Fast Causal Inference (FCI) algorithm.
Supermassive black holes (like that one in the famous picture) are found at the center of most galaxies, but only a relatively small set of them has a reliable measurement of mass. There is a long standing chicken-egg debate on whether supermassive black holes shape their host galaxy, that is (black hole) -> (galaxy properties) or the other way around, i.e. (black hole) <- (galaxy properties). This is not a purely data-driven debate either: there is lots of theory about black hole accretion and galaxy formation and how the two interact. The word co-evolution is frequently heard in this context.
Looks like a good open problem to apply causal discovery to, right?
[…to be continued…]
[1] Even though, strictly speaking, no knowledge is really ‘prior’. Everything we know we learned from data, either as individuals or as a species through brain structures hard-coded by evolution.
[2] Clearly one could have smoked enough to get stained teeth and then quit, so we might count him among the people who have stained teeth but neither smoke nor drink coffee. This suggests that the variable smoking should be split into smoking yesterday and smoking today. Agreeing on what exactly is to be considered a variable is important. Here I keep it simple by assuming that smokers never quit.
[3] The first one is better than the second in some cases: making simplified models of reality (or even of someone else’s theory) is an exercise in deliberate, controlled misunderstanding! The feeling of ‘knowing what you should do’, even if unjustified, is empowering. On the other hand, realising that the pieces of the puzzle are not fitting together can be paralising.