Motivated by applications in computational advertising and systems biology, we consider the problem of identifying the best out of several possible soft interventions at a source node V in an acyclic causal directed graph, to maximize the expected value of a target node Y (located downstream of V ). Our setting imposes a fixed total budget for sampling under various interventions, along with cost constraints on different types of interventions. We pose this as a best arm identification bandit problem with K arms where each arm is a soft intervention at V, and leverage the information leakage among the arms to provide the first gap dependent error and simple regret bounds for this problem. Our results are a significant improvement over the traditional best arm identification results in this setting. We empirically show that our algorithms outperform the state of the art in a Flow Cytometry data-set, and also apply our algorithm for model interpretation of the Inception-v3 deep net that classifies images.
This is based on joint work with Rajat Sen, Karthikeyan Shanmugam, and Alex Dimakis.