Overview | Motivation | Installation | Usage | Benchmarks | To-Do | Acknowledgements | License
Faster binned scatterplots in Stata with a few new bells and whistles
version 0.22 29jul2019
binscatter2 is a program for producing binned scatterplots in Stata. It inherits the syntax and functionality of the excellent binscatter package, but runs substantially faster for big datasets (see benchmarks). In addition, binscatter2 offers a handful of new features: the ability to overlay additional information about the conditional probability distribution (e.g. quantile intervals), an alternative procedure to adjust for controls suggested by Cattaneo et al. (2019), additional options for fit lines and saving, and multi-way fixed effects.
Binned scatterplots are a convenient, non-parametric method to visualize conditional expectation functions. They are useful for examining the relationship between variables, possibly conditional on a set of covariates and/or fixed effects. Michael Stepner has provided a wonderful slide deck describing binned scatterplots on his website, available here.
Anyone who has used binscatter on a large dataset can appreciate that it takes a while to run. The original binscatter program is extremely well-written and was very efficient when it was written; however, recent improvements made possible by the Stata program gtools have allowed several of the operations underlying binscatter to be accomplished much more efficiently, as demonstrated in the benchmarks below. When working with data involving tens or hundreds of millions of observations, binscatter2 runs between three and eight times faster than binscatter, with the largest relative performance gains on very large datasets.
In addition, binscatter2 contains a handful of additional new features intended to enhance the functionality of binscatter. For one, binscatter now allows quantile intervals to be overlaid on top of the graph. This allows the user to gauge variation in the conditional distribution of y given x.
In addition to substantial performance improvements for large datasets (see benchmarks), binscatter2 adds a few new features to binscatter. In particular:
There are two options for installing binscatter2. The only prerequisite is the gtools command, which can be installed from Github or the SSC repository.
ssc install gtools
net install binscatter2, from("https://raw.githubusercontent.com/mdroste/stata-binscatter2/master/")
This project will be submitted to the SSC repository very soon.
Complete internal documentation is provided with the installation and can be accessed by typing:
help binscatter2
The basic syntax and usage of binscatter2 is inherited from binscatter and should be familiar to existing users of that program.
This repository includes a do-file, check.do, that provides a number of checks to verify the functionality of each option within binscatter2 and demonstrates equivalence to binscatter for options shared by both programs. The file check_speed.do runs Monte Carlo simulations that were used in the benchmark section of this readme.
The following items will be addressed soon:
Binscatter2 builds extensively on binscatter , developed by the illustrious Michael Stepner and Jessica Laird.
In addition, binscatter2 would certainly not have been possible without gtools by Mauricio Caceres Bravo, which in turn would not have happened without ftools, developed by Sergio Correa.
The alternative covariate adjustment procedure (enabled with the option altcontrols) was formalized by Cattaneo et al. (2019).
binscatter2 is MIT-licensed.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。