1 Star 1 Fork 13

连享会 / propensity-score-matching-in-stata

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

An introduction to propensity score matching in STATA

Thomas G. Stewart Assistant Professor

This lecture is part 9 of the Propensity Scores and Related Methods Series presented and organized by Robert Greevy within Vanderbilt University's Center for Health Services Research.

NOTE 1

I reserve the right for these notes to be wrong, mistaken, or incomplete.

NOTE 2

These notes will continue to be updated as I

  • expand the content
  • generate more examples
  • respond to helpful feedback regarding items in NOTE 1.

Please feel free to provide content and comments.

SOFTWARE

  • StataCorp. 2015. Stata Statistical Software: Release 14. College Station, TX: StataCorp LP.

  • E. Leuven and B. Sianesi. 2003. PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. (link). Version 4.0.11. To install in STATA, use command:

ssc install psmatch2
  • Phil Clayton. 2013. TABLE1: module to create "table 1" of baseline characteristics for a manuscript. Version 1.1. To install in STATA, use command:
ssc install table1

REFERENCES

  • Elizabeth A. Stuart. 2010. Matching Methods for Causal Inference: A Review and a Look Forward, Statistical Science, Vol. 25, No. 1, 1–21.

DATA FOR EXAMPLES AND DISCUSSION

To motivate the propensity score matching, I'll use the cattaneo2 dataset, a STATA example dataset. It can be loaded with the following command:

webuse cattaneo2

The data in cattaneo2 is a subset of data that was analysed in the following journal articles:

  • Almond, D., Chay, K.Y., Lee, D.S., 2005. The costs of low birth weight, Quarterly Journal of Economics 120, 1031-1083.

  • Cattaneo, M.D. 2010. Efficient semiparametric estimation of multi-valued treatment effects under ignorability, Journal of Econometrics, 155(2), 138-154.

The dataset included information about infant/mother/father characteristics from singleton births in Pennsylvania between 1989 and 1991. The original dataset included nearly 500,000 births. The STATA example dataset includes 4642 births.

RESEARCH QUESTION: What is the effect of maternal smoking during pregnancy on the infant's birthweight?

The dataset includes the following set of variables:

Infant Mother Father
birth weight (grams) married/not married
birth month hispanic (yes/no) hispanic (yes/no)
race (white/not white) race (white/not white)
age age
education education
foreign born (yes/no)
birth number
months since last birth
infant of previous births died (yes/no)
number of prenatal care visits
trimester of first prenatal care visit
alcohol during pregnancy (yes/no)
smoking during pregnancy (0, 1-5, 6-10, 11+ daily)
smoking during pregnancy (yes/no)

You can access the complete codebook with the command codebook after loading the data.

BIG PICTURE: BACKGROUND

Randomized Clinical Trial Observational Study
Treatment Assignment: Investigators generate a treatment schedule prior to patient enrollment. The schedule is constructed based on the design of the study, which includes randomization in some fashion. Physicians (who may be blind to treatment as well) assign treatments/exposures to study participants following the sequence in the schedule.
  • Phyicians assign treatment/exposure based on
    • characteristics of the patient
    • personal preference
    • regional preference
    • insurance restrictions
  • Study participants choose for themselves treatments or behaviors
  • Exposures are based on geographic location or cultural identity
CONSEQUENTLY
Probability of Treatment: Known Unknown, may be 0 or 1
Covariate Balance: Relationship between covariates and treatment assignment are known from study design. Usually the study is designed so that there is no relationship between treatment assignment and covariates. Relationship between covariates and treatment assignment is unknown. There may be covariate imbalance.

BIG PICTURE: CHALLENGES

Differences in outcomes between the treated and untreated (or exposed and unexposed) may be the consequence of confounding variables and not the treatment (or exposure).

Dataset may include sub-groups for which a treatment effect should not be calculated because

  • the sub-group may not be eligible for one treatment (a treatment effect should only be calculated in populations eligible for both treatments)
  • though a sub-group may be eligible for both treatments, there may not be enough data for a comparison without extrapolation.

BIG PICTURE: SOLUTIONS

There are several methods for estimating a treatment effect with observational data. In this lecture series, you have been exposed (not randomly) to a family of methods which use the propensity score. The primary focus has been on propensity score matching.

STEPS (A SUMMARY OF STUART 2010)

  1. Defining “closeness”: the distance measure used to determine whether an individual is a good match for another.
    1. Variables to include
      • Researchers should thus be liberal in terms of including variables that may be associated with treatment assignment and/or the outcomes.
      • One type of variable that should not be included in the matching process is any variable that may have been affected by the treatment of interest (Rosenbaum, 1984; Frangakis and Rubin, 2002; Greenland, 2003).
    2. Distance Measures
      • Exact
      • Mahalanobis
      • Propensity score / linear propensity score
        • With propensity score estimation, concern is not with the parameter estimates of the model, but rather with the resulting balance of the covariates (Augurzky and Schmidt, 2001).
  2. Implementing a matching method, given that measure of closeness.
  • Methods:
    1. k:1 Nearest Neighbor
      • Estimates Average Treatment Effect in the Treated (ATT or ATET)
      • May need a caliper
    2. Subclassification
    3. Full matching
    4. Weighting (IPTW)
  • Assess Common Support
    • Examining the common support may indicate that it is not possible to reliably estimate the ATE.
  1. Assessing the quality of the resulting matched samples, and perhaps iterating with steps 1 and 2 until well-matched samples result.
    • Perhaps the most important step in using matching methods is to diagnose the quality of the resulting matched samples
    • Tools:
      1. Histograms / Density Plots / ECDF
      2. Standardized Bias / Standardized difference in means
  2. Analysis of the outcome and estimation of the treatment effect, given the matching done in step 3.
    1. After k:1 mathcing
      • May not need to account for matched pair
      • Must use weights if matching was with replacement
      • Use regression adjustment
    2. After subclassification
      • Aggregation weights determine estimation of ATT or ATE
      • Use regression adjustment

DEMONSTRATION

MIT License Copyright (c) 2022 连享会 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

use datasets of Cattaneo (2010) to perform PSM in stata 展开 收起
Jupyter Notebook 等 6 种语言
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/arlionn/propensity-score-matching-in-stata.git
git@gitee.com:arlionn/propensity-score-matching-in-stata.git
arlionn
propensity-score-matching-in-stata
propensity-score-matching-in-stata
master

搜索帮助

14c37bed 8189591 565d56ea 8189591