Both reinforcement learning (RL)  and causal inference  are indispensable part of machine learning and each plays an essential role in artificial intelligence. What originally motivated me to integrate both is the recent development of machine learning in healthcare and medicine. In retrospect, human beings, since their birth, have been inevitably accompanied by diseases and have been fighting against them, relentlessly in pursuit of health. In recent decades, the burgeoning growth of machine learning has facilitated the revolutionary development in healthcare: some artificial intelligence systems have approached or even surpassed human experts in terms of cancer classification , cancer detection , diabetic retinopathy detection , and diagnosis of blinding retinal diseases . Benefiting greatly from the continuing explosion in computing power and capability, artificial intelligence (AI) will, without doubt, help reshape the future of medicine. Just imagine this scenario: in the future, everyone has a personally customized AI doctor on their own desk, recording all the data about their medical history since they were born. On the basis of individual medical data, personal deceases can be anticipated and prevented in advance, or at least cured in time, which, to a large extent, would lengthen human life expectancy.
However, the current approaches successfully being applied to the medical problems mentioned above are solely based on association rather than causation. In statistics, it is acknowledged that association does not logically imply causation [10,12]. The relation between association and causation was formalized by Reichenbach  into his famous Principle of Common Cause: If two random variables and are statistically dependent, then one of the following causal explanations must hold: a) causes ; b) causes ; c) there exists a random variable that is the common cause of both and . Therefore, compared with association, causation steps further, exploring more essential relations between variables. The central task of causal inference aims at uncovering the cause-effect relations between different variables. Understanding the causal structure of a system would equip us with the following abilities: (1) predict what would happen if some variables are intervened; (2) estimate the effect of confounding factors that affect both an intervention and its outcome ; (3) allow us to predict the outcomes of cases that are never observed before. If we see treatments in medicine as interventions and treatment effects as outcomes (e.g., understanding the effect of medications on a patient’s health, estimating the effect of unobserved confounders affecting both treatment and a patient’s general well-being, evaluating the survival rate of different treatments for a patient’s disease, etc.), these abilities are exactly what healthcare requires but most existing methods have not yet possessed. That is why causality plays a pivotal role in developing genuinely intelligent algorithms for healthcare.
A natural idea to implement the concept of intervention in causal inference is to exploit the concept of action in RL [17,2]. To be more specific, an agent can observe the change of states in the environment and obtain immediate rewards (outcomes) by taking different actions (interventions) depending on its current state. The goal of the agent, however, is to maximize the expected accumulated reward, indicating that RL itself does not have the ability to conduct causal inference. Therefore, causal inference can in turn assist RL in learning value functions or policies more efficiently and effectively through inferring causal relations between states or between states and actions, such as reducing the state or action space and handling confounders. It is evident that causality and RL are complementary and can be integrated from the causal perspective to enhance both.
To this end, we propose to integrate causal inference into RL, allowing RL to be able to infer causal effects between data in the complicated real-world medical problems. Taking advantage of both sides, we can estimate the genuinely practical effect of treatments predicated on the existence of unobserved confounders on a patient’s health, and further find the optimal treatment strategy during the course of interacting with the patient. Take for example sepsis, a life-threatening condition that arises when the body’s response to infection causes injury to its own tissues and organs , it is a leading cause of mortality in intensive care units and costs hospitals billions annually . When tackling sepsis treatment strategy, RL usually regards the measured physiological parameters, including demographics, lab values, vital signs, and intake/output events [6,13], as states guiding further treatment and dosage for patients. However, it is probably inevitable to involve a few unobserved confounding factors that significantly affect the treatment strategy during this course, which is difficult to handle within the current RL framework. Fortunately, we can leverage causal inference to cope with this issue, where the effect of potential hidden confounders on both treatment and patients’ health can be evaluated, leading to that the treatment strategy will be accordingly adjusted.
As a matter of fact, looking back at the history of science, human beings always progress in a similar manner to that of causal reinforcement learning (Causal RL). More specifically, humans summarize rules or experience from their interaction with nature and then exploit this to improve their adaptation in the next exploration. What Causal RL does is exactly to mimic human behaviors, learning causal effects from an agent communicating with the environment and then optimizing its policy based on the learned causal relations.
The reason that I highlight this analogy is to emphasize the importance of Causal RL, which will, without doubt, become an indispensable part of Artificial General Intelligence (AGI), and will have great potential applications not only in healthcare and medicine but also in all other RL scenarios. Compared to RL, Causal RL has two obvious advantages inherited from causal inference: data efficiency and minimal change. It is widely acknowledged that RL algorithms are hungry for data. In contrast, Causal RL is not driven by data, because causal graph is the most stable structure that consists of “must-have” relations, instead of “nice-to-have” relations in the associational graph. In other words, as long as causal relations exist, they will not be influenced by data, no matter how much. From the perspective of causal reasoning, once causal structures are known, we are allowed to answer a large number of interventional and counterfactual questions without or with only a few experiments, which will significantly reduce our reliance on data. For instance, if some causal knowledge about actions are provided in advance or can be learned from initial experiments, the action space would be exponentially narrowed. The other fascinating property is minimal change, by which I mean only the minimal set of (conditional) distributions will change when the environments or domains shift. From the causal viewpoint, assuming invariance of conditionals makes sense if the conditionals represent causal mechanisms [4,15,10]. Intuitively, causal mechanisms can be viewed as properties of the physical world, like Newton’s laws of motion, which does not depend on what we feed into it. If the input changes, the causal mechanisms remain intact [5,11]. In the anticausal direction, however, the conditionals will be affected by the input . Hence, Causal RL will have the minimal change when environments shift. In fact, a direct benefit of minimal change is data efficiency, because agents can transfer the invariant causal knowledge they learn from one environment to another, without the need of learning from scratch.
 David Capper, David TW Jones, Martin Sill, Volker Hovestadt, Daniel Schrimpf, Dominik Sturm, Christian Koelsche, Felix Sahm, Lukas Chavez, David E Reuss, et al. Dna methylation-based classification of central nervous system tumours. Nature, 2018.
 Samuel J Gershman. Reinforcement learning and causal models. The Oxford handbook of causal reasoning, page 295, 2017.
 Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 316(22):2402–2410, 2016.
 Kevin D Hoover. The logic of causal inference: Econometrics and the conditional analysis of causa- tion. Economics & Philosophy, 6(2):207–234, 1990.
 Dominik Janzing and Bernhard Scholkopf. Causal inference using the algorithmic markov condition. IEEE Transactions on Information Theory, 56(10):5168–5194, 2010.
 Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016.
 Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS Valentim, Huiying Liang, Sally L Baxter, Alex McKeown, Ge Yang, Xiaokang Wu, Fangbing Yan, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5):1122–1131, 2018.
 Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venugopalan, Aleksei Timofeev, Philip Q Nelson, Greg S Corrado, et al. Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442, 2017.
 Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems, pages 6449–6459, 2017.
 Judea Pearl. Causality. Cambridge university press, 2009.
 Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant pre- diction: identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5):947–1012, 2016.
 Jonas Peters, Dominik Janzing, and Bernhard Schoelkopf. Elements of Causal Inference: Founda- tions and Learning Algorithms. The MIT press, 2018.
 Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi. Deep reinforcement learning for sepsis treatment. arXiv preprint arXiv:1711.09602, 2017.
 Hans Reichenbach. The direction of time, volume 65. Univ of California Press, 1991.
 Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. Invariant models for causal transfer learning. The Journal of Machine Learning Research, 19(1):1309–1342, 2018.
 Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij. On causal and anticausal learning. arXiv preprint arXiv:1206.6471, 2012.
 Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.
 Wikipedia. Sepsis — Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php? title=Sepsis&oldid=833990644, 2018. [Online; accessed 06-April-2018].