A variety of methods have been developed for constructing causal models. These include methods for estimating the structure and parameters of causal graphical models, as well as a large number of methods for estimating individual causal dependencies (e.g., propensity score methods). The primary evidence for the effectiveness of these methods is based on either theoretical proofs or performance on synthetic data. In this talk, I review the state of this evidence and argue that empirical evaluation is a virtual necessity for the field to progress. I show how the progress of non-causal modeling methods was transformed in the 1980s and 1990s by a focus on empirical evaluation. I describe a set of techniques for empirical evaluation of methods for causal modeling, including some novel data sets and evaluation techniques developed in my research group. Finally, I briefly survey several practical issues that are likely to arise if empirical evaluation becomes the norm and how considering these issues could significantly advance the field of causal modeling.