1 Star 0 Fork 161

morcc / mostly-harmless-replication

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
Table 3-3-2.py 2.59 KB
一键复制 编辑 原始数据 按行查看 历史
vikjam 提交于 2015-06-22 16:36 . Clean-up, add Python to the list.
#!/usr/bin/env python
"""
Tested on Python 3.4
"""
import urllib.request
import pandas as pd
import statsmodels.api as sm
import numpy as np
import patsy
from tabulate import tabulate
# Download data
urllib.request.urlretrieve('http://economics.mit.edu/files/3828', 'nswre74.dta')
urllib.request.urlretrieve('http://economics.mit.edu/files/3824', 'cps1re74.dta')
urllib.request.urlretrieve('http://economics.mit.edu/files/3825', 'cps3re74.dta')
# Read the Stata files into Python
nswre74 = pd.read_stata("nswre74.dta")
cps1re74 = pd.read_stata("cps1re74.dta")
cps3re74 = pd.read_stata("cps3re74.dta")
# Store list of variables for summary
summary_vars = ['age', 'ed', 'black', 'hisp', 'nodeg', 'married', 're74', 're75']
# Calculate propensity scores
# Create formula for probit
f = 'treat ~ ' + ' + '.join(['age', 'age2', 'ed', 'black', 'hisp', \
'nodeg', 'married', 're74', 're75'])
# Run probit with CPS-1
y, X = patsy.dmatrices(f, cps1re74, return_type = 'dataframe')
model = sm.Probit(y, X).fit()
cps1re74['pscore'] = model.predict(X)
# Run probit with CPS-3
y, X = patsy.dmatrices(f, cps3re74, return_type = 'dataframe')
model = sm.Probit(y, X).fit()
cps3re74['pscore'] = model.predict(X)
# Create function to summarize data
def summarize(dataset, conditions):
stats = dataset[summary_vars][conditions].mean()
stats['count'] = sum(conditions)
return stats
# Summarize data
nswre74_treat_stats = summarize(nswre74, nswre74.treat == 1)
nswre74_control_stats = summarize(nswre74, nswre74.treat == 0)
cps1re74_control_stats = summarize(cps1re74, cps1re74.treat == 0)
cps3re74_control_stats = summarize(cps3re74, cps3re74.treat == 0)
cps1re74_ptrim_stats = summarize(cps1re74, (cps1re74.treat == 0) & \
(cps1re74.pscore > 0.1) & \
(cps1re74.pscore < 0.9))
cps3re74_ptrim_stats = summarize(cps3re74, (cps3re74.treat == 0) & \
(cps3re74.pscore > 0.1) & \
(cps3re74.pscore < 0.9))
# Combine summary stats, add header and print to markdown
frames = [nswre74_treat_stats,
nswre74_control_stats,
cps1re74_control_stats,
cps3re74_control_stats,
cps1re74_ptrim_stats,
cps3re74_ptrim_stats]
summary_stats = pd.concat(frames, axis = 1)
header = ["NSW Treat", "NSW Control", \
"Full CPS-1", "Full CPS-3", \
"P-score CPS-1", "P-score CPS-3"]
print(tabulate(summary_stats, header, tablefmt = "pipe"))
# End of script
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/morcc7/mostly-harmless-replication.git
git@gitee.com:morcc7/mostly-harmless-replication.git
morcc7
mostly-harmless-replication
mostly-harmless-replication
master

搜索帮助