# propensity-score-matching-in-stata **Repository Path**: arlionn/propensity-score-matching-in-stata ## Basic Information - **Project Name**: propensity-score-matching-in-stata - **Description**: use datasets of Cattaneo (2010) to perform PSM in stata - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 13 - **Created**: 2017-10-26 - **Last Updated**: 2023-01-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # An introduction to propensity score matching in STATA Thomas G. Stewart Assistant Professor This lecture is part 9 of the *Propensity Scores and Related Methods Series* presented and organized by Robert Greevy within Vanderbilt University's Center for Health Services Research. ## NOTE 1 I reserve the right for these notes to be wrong, mistaken, or incomplete. ## NOTE 2 These notes will continue to be updated as I * expand the content * generate more examples * respond to helpful feedback regarding items in NOTE 1. Please feel free to provide content and comments. ## SOFTWARE * StataCorp. 2015. *Stata Statistical Software: Release 14.* College Station, TX: StataCorp LP. * E. Leuven and B. Sianesi. 2003. *PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing*. [(link)](http://ideas.repec.org/c/boc/bocode/s432001.html). Version 4.0.11. To install in STATA, use command: ``` ssc install psmatch2 ``` * Phil Clayton. 2013. *TABLE1: module to create "table 1" of baseline characteristics for a manuscript*. Version 1.1. To install in STATA, use command: ``` ssc install table1 ``` ## REFERENCES * Elizabeth A. Stuart. 2010. *Matching Methods for Causal Inference: A Review and a Look Forward,* Statistical Science, Vol. 25, No. 1, 1–21. ## DATA FOR EXAMPLES AND DISCUSSION To motivate the propensity score matching, I'll use the `cattaneo2` dataset, a STATA example dataset. It can be loaded with the following command: ```stata webuse cattaneo2 ``` The data in `cattaneo2` is a subset of data that was analysed in the following journal articles: * Almond, D., Chay, K.Y., Lee, D.S., 2005. *The costs of low birth weight,* Quarterly Journal of Economics 120, 1031-1083. * Cattaneo, M.D. 2010. *Efficient semiparametric estimation of multi-valued treatment effects under ignorability,* Journal of Econometrics, 155(2), 138-154. The dataset included information about infant/mother/father characteristics from singleton births in Pennsylvania between 1989 and 1991. The original dataset included nearly 500,000 births. The STATA example dataset includes 4642 births. >**RESEARCH QUESTION**: What is the effect of maternal smoking during pregnancy on the infant's birthweight? The dataset includes the following set of variables: Infant | Mother | Father :--- | :--- | :--- birth weight (grams) | married/not married | birth month | hispanic (yes/no) | hispanic (yes/no) | race (white/not white) | race (white/not white) | age | age | education | education | foreign born (yes/no) | | birth number | | months since last birth | | infant of previous births died (yes/no) | | number of prenatal care visits | | trimester of first prenatal care visit | | alcohol during pregnancy (yes/no) | | smoking during pregnancy (0, 1-5, 6-10, 11+ daily) | | smoking during pregnancy (yes/no) | You can access the complete codebook with the command `codebook` after loading the data. ## BIG PICTURE: BACKGROUND
| Randomized Clinical Trial | Observational Study | |
|---|---|---|
| Treatment Assignment: | Investigators generate a treatment schedule prior to patient enrollment. The schedule is constructed based on the design of the study, which includes randomization in some fashion. Physicians (who may be blind to treatment as well) assign treatments/exposures to study participants following the sequence in the schedule. |
|
| CONSEQUENTLY | ||
| Probability of Treatment: | Known | Unknown, may be 0 or 1 |
| Covariate Balance: | Relationship between covariates and treatment assignment are known from study design. Usually the study is designed so that there is no relationship between treatment assignment and covariates. | Relationship between covariates and treatment assignment is unknown. There may be covariate imbalance. |