# 倾向匹配得分 **Repository Path**: econometric/propensity-matching-score ## Basic Information - **Project Name**: 倾向匹配得分 - **Description**: 倾向匹配得分 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2022-01-07 - **Last Updated**: 2022-05-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 倾向得分匹配法(PSM)举例及 stata 实现 # 政策背景: 国家支持工作示范项目(National Supported Work,NSW) # 研究目的: 检验接受该项目(培训)与不接受该项目(培训)对工资的影响。 # 基本思想: 分析接受培训组(处理组,treatment group)接受培训行为与不接受培训行为在工资表现上的差异。但是,现实可以观测到的是处理组接受培训的事实,而处理组没有接受培训会怎样是不可能观测到的,这种状态也成为反事实(counterfactual)。匹配法就是为了解决这种不可观测事实的方法。在倾向得分匹配方法(Propensity Score Matching)中,根据处理指示变量将样本分为两个组,一是处理组,在本例中就是在 NSW 实施后接受培训的组;二是对照组(comparison group), 在本例中就是在 NSW 实施后不接受培训的组。 倾向得分匹配方法的基本思想是,在处理组和对照组样本通过一定的方式匹配后,在其他条件完全相同的情况下,通过接受培训的组(处理组)与不接受培训的组(对照组)在工资表现上的差异来判断接受培训的行为与工资之间的因果关系 ``` . desc Contains data from C:\Users\Metrics\Desktop\计量经济学\高级\A15-psm\data\ldw_exper.dta obs: 445 vars: 12 30 Jan 2013 12:47 size: 12,015 -------------------------------------------------------------------------------------- storage display value variable name type format label variable label -------------------------------------------------------------------------------------- t byte %8.0g participation in job training program age byte %8.0g age educ byte %8.0g years of education black byte %8.0g indicator for African-American hisp byte %8.0g indicator for Hispanic married byte %8.0g indicator for married nodegree byte %8.0g indicator for more than grade school but less than high-school education re74 float %9.0g real earnings in 1974 (in thousands of 1978 $) re75 float %9.0g real earnings in 1975 (in thousands of 1978 $) re78 float %9.0g real earnings in 1978 (in thousands of 1978 $) u74 float %9.0g indicator for unemployed in 1974 u75 float %9.0g indicator for unemployed in 1975 -------------------------------------------------------------------------------------- Sorted by: . . ``` 按处理组分类统计 ``` bysort t :sum age educ nodegree black hisp married u74 u75 -------------------------------------------------------------------------------------- -> t = 0 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- age | 260 25.05385 7.057745 17 55 educ | 260 10.08846 1.614325 3 14 nodegree | 260 .8346154 .3722439 0 1 black | 260 .8269231 .3790434 0 1 hisp | 260 .1076923 .3105893 0 1 -------------+--------------------------------------------------------- married | 260 .1538462 .3614971 0 1 u74 | 260 .75 .4338478 0 1 u75 | 260 .6846154 .4655651 0 1 -------------------------------------------------------------------------------------- -> t = 1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- age | 185 25.81622 7.155019 17 48 educ | 185 10.34595 2.01065 4 16 nodegree | 185 .7081081 .4558666 0 1 black | 185 .8432432 .3645579 0 1 hisp | 185 .0594595 .2371244 0 1 -------------+--------------------------------------------------------- married | 185 .1891892 .3927217 0 1 u74 | 185 .7081081 .4558666 0 1 u75 | 185 .6 .4912274 0 1 . . . ``` 描述性分析 ``` tabulate t, summarize(re78) means standard ``` 结果为: ``` . tabulate t, summarize(re78) means standard participati | Summary of real on in job | earnings in 1978 (in training | thousands of 1978 $) program | Mean Std. Dev. ------------+------------------------ 0 | 4.5548023 5.4838368 1 | 6.3491454 7.8674047 ------------+------------------------ Total | 5.3007651 6.6314934 ``` # 设置种子数 ``` set seed 20180105 //产生随机数种子 gen u=runiform() sort u //排序 或者order u ``` 上述命令是为了生成伪随机数,满足01的均匀分布 # 生成宏变量 ``` local v1 "t" local v2 "age edu black hisp married re74 re75 u74 u75" global x "`v1' `v2' " ``` # 倾向匹配得分 ``` psmatch2 $x, out(re78) neighbor(1) ate ties logit common // 1:1 匹配 $表示引用宏变量, ``` ``` psmatch2 $x, out(re78) neighbor(1) ate ties logit common // 1:1 匹 等价于 psmatch2 t age edu black hisp married re74 re75 u74 u75, out(re78) neighbor(1) ate ties logit common ``` 结果为: ``` psmatch2 $x, out(re78) neighbor(1) ate ties logit common Logistic regression Number of obs = 445 LR chi2(9) = 11.70 Prob > chi2 = 0.2308 Log likelihood = -296.25026 Pseudo R2 = 0.0194 ------------------------------------------------------------------------------ t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0142619 .0142116 1.00 0.316 -.0135923 .0421162 educ | .0499776 .0564116 0.89 0.376 -.060587 .1605423 black | -.347664 .3606532 -0.96 0.335 -1.054531 .3592032 hisp | -.928485 .50661 -1.83 0.067 -1.921422 .0644523 married | .1760431 .2748817 0.64 0.522 -.3627151 .7148012 re74 | -.0339278 .0292559 -1.16 0.246 -.0912683 .0234127 re75 | .01221 .0471351 0.26 0.796 -.0801731 .1045932 u74 | -.1516037 .3716369 -0.41 0.683 -.8799987 .5767913 u75 | -.3719486 .317728 -1.17 0.242 -.9946841 .2507869 _cons | -.4736308 .8244205 -0.57 0.566 -2.089465 1.142204 ------------------------------------------------------------------------------ There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. -------------------------------------------------------------------------------------- > -- Variable Sample | Treated Controls Difference S.E. T-st > at ----------------------------+--------------------------------------------------------- > -- re78 Unmatched | 6.34914538 4.55480228 1.79434311 .632853552 2. > 84 ATT | 6.40495818 4.99436488 1.4105933 .839875971 1. > 68 ATU | 4.52683013 6.15618973 1.6293596 . > . ATE | 1.53668776 . > . ----------------------------+--------------------------------------------------------- > -- Note: S.E. does not take into account that the propensity score is estimated. psmatch2: | psmatch2: Common Treatment | support assignment | Off suppo On suppor | Total -----------+----------------------+---------- Untreated | 11 249 | 260 Treated | 2 183 | 185 -----------+----------------------+---------- Total | 13 432 | 445 . . . ``` # 下面用pstest查看匹配效果是否较好的平衡了数据 ``` . pstest age edu black hisp married re74 re75 u74 u75, both graph -------------------------------------------------------------------------------------- > -- Unmatched | Mean %reduct | t-test | V(T)/ Variable Matched | Treated Control %bias |bias| | t p>|t| | V(C) --------------------------+----------------------------------+---------------+-------- > -- age U | 25.816 25.054 10.7 | 1.12 0.265 | 1.03 M | 25.781 25.383 5.6 47.7 | 0.52 0.604 | 0.91 | | | educ U | 10.346 10.088 14.1 | 1.50 0.135 | 1.55* M | 10.322 10.415 -5.1 63.9 | -0.49 0.627 | 1.52* | | | black U | .84324 .82692 4.4 | 0.45 0.649 | . M | .85246 .86339 -2.9 33.0 | -0.30 0.765 | . | | | hisp U | .05946 .10769 -17.5 | -1.78 0.076 | . M | .06011 .04372 5.9 66.0 | 0.71 0.481 | . | | | married U | .18919 .15385 9.4 | 0.98 0.327 | . M | .18579 .19126 -1.4 84.5 | -0.13 0.894 | . | | | re74 U | 2.0956 2.107 -0.2 | -0.02 0.982 | 0.74* M | 2.0672 1.9222 2.7 -1166.6 | 0.27 0.784 | 0.88 | | | re75 U | 1.5321 1.2669 8.4 | 0.87 0.382 | 1.08 M | 1.5299 1.6446 -3.6 56.7 | -0.32 0.748 | 0.82 | | | u74 U | .70811 .75 -9.4 | -0.98 0.326 | . M | .71038 .75956 -11.1 -17.4 | -1.06 0.288 | . | | | u75 U | .6 .68462 -17.7 | -1.85 0.065 | . M | .60656 .63388 -5.7 67.7 | -0.54 0.591 | . | | | -------------------------------------------------------------------------------------- > -- * if variance ratio outside [0.75; 1.34] for U and [0.75; 1.34] for M ----------------------------------------------------------------------------------- Sample | Ps R2 LR chi2 p>chi2 MeanBias MedBias B R %Var -----------+----------------------------------------------------------------------- Unmatched | 0.019 11.75 0.227 10.2 9.4 33.1* 0.82 50 Matched | 0.008 3.87 0.920 4.9 5.1 20.6 1.09 25 ----------------------------------------------------------------------------------- * if B>25%, R outside [0.5; 2] . ``` # 1 ``` psgraph ```