代码拉取完成,页面将自动刷新
同步操作将从 李建成/mostly-harmless-replication 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
#!/usr/bin/env python
"""
Tested on Python 3.4
"""
import urllib.request
import pandas as pd
import statsmodels.api as sm
import numpy as np
import patsy
from tabulate import tabulate
# Download data
urllib.request.urlretrieve('http://economics.mit.edu/files/3828', 'nswre74.dta')
urllib.request.urlretrieve('http://economics.mit.edu/files/3824', 'cps1re74.dta')
urllib.request.urlretrieve('http://economics.mit.edu/files/3825', 'cps3re74.dta')
# Read the Stata files into Python
nswre74 = pd.read_stata("nswre74.dta")
cps1re74 = pd.read_stata("cps1re74.dta")
cps3re74 = pd.read_stata("cps3re74.dta")
# Store list of variables for summary
summary_vars = ['age', 'ed', 'black', 'hisp', 'nodeg', 'married', 're74', 're75']
# Calculate propensity scores
# Create formula for probit
f = 'treat ~ ' + ' + '.join(['age', 'age2', 'ed', 'black', 'hisp', \
'nodeg', 'married', 're74', 're75'])
# Run probit with CPS-1
y, X = patsy.dmatrices(f, cps1re74, return_type = 'dataframe')
model = sm.Probit(y, X).fit()
cps1re74['pscore'] = model.predict(X)
# Run probit with CPS-3
y, X = patsy.dmatrices(f, cps3re74, return_type = 'dataframe')
model = sm.Probit(y, X).fit()
cps3re74['pscore'] = model.predict(X)
# Create function to summarize data
def summarize(dataset, conditions):
stats = dataset[summary_vars][conditions].mean()
stats['count'] = sum(conditions)
return stats
# Summarize data
nswre74_treat_stats = summarize(nswre74, nswre74.treat == 1)
nswre74_control_stats = summarize(nswre74, nswre74.treat == 0)
cps1re74_control_stats = summarize(cps1re74, cps1re74.treat == 0)
cps3re74_control_stats = summarize(cps3re74, cps3re74.treat == 0)
cps1re74_ptrim_stats = summarize(cps1re74, (cps1re74.treat == 0) & \
(cps1re74.pscore > 0.1) & \
(cps1re74.pscore < 0.9))
cps3re74_ptrim_stats = summarize(cps3re74, (cps3re74.treat == 0) & \
(cps3re74.pscore > 0.1) & \
(cps3re74.pscore < 0.9))
# Combine summary stats, add header and print to markdown
frames = [nswre74_treat_stats,
nswre74_control_stats,
cps1re74_control_stats,
cps3re74_control_stats,
cps1re74_ptrim_stats,
cps3re74_ptrim_stats]
summary_stats = pd.concat(frames, axis = 1)
header = ["NSW Treat", "NSW Control", \
"Full CPS-1", "Full CPS-3", \
"P-score CPS-1", "P-score CPS-3"]
print(tabulate(summary_stats, header, tablefmt = "pipe"))
# End of script
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。