# stata-dta-in-python

**Repository Path**: arlionn/stata-dta-in-python

## Basic Information

- **Project Name**: stata-dta-in-python
- **Description**: Use Stata .dta files in Python
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 2
- **Forks**: 5
- **Created**: 2020-02-25
- **Last Updated**: 2022-11-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

###################
Stata dta in Python
###################

This is a package for using Stata .dta files in Python. The main functionality of the package is in its ``Dta`` class and subclasses, which encapsulate the information from a .dta file, and provide methods for adding, replacing, or deleting this information. 

You can create ``Dta`` objects from .dta files or from iterables of Python values. You can manipulate ``Dta`` objects in basic ways (add observations, replace data values, rename data variables etc.), and you can save ``Dta`` objects to .dta files. 

This package has been tested on Python 3.1, 3.2, and 3.3. Some parts of this package do not work in Python 2. Support for Python 2 might be added at a later date.

Currently, this package supports .dta file formats 114, 115, and 117.


Requirements
============

Python 3.1 - 3.4


Installation
============

Download the package, either with::

    git clone https://github.com/jrfiedler/stata-dta-in-python

or by downloading a zip archive (there's a button on the right side of this page) and unzipping. 

Then, in the main folder, use::

    python setup.py install

to install.

Changelog
=========

0.2.0
-----

- Added quick access to data variables, as in `dta.varname_`
- Added `stata_math` module provides functions that understand missing values and quick-access data variables
- New method `quiet()` silences warnings and other 'unexpected' output
- New method `get(row, col)` for getting a single data value

See examples "Quick access to data variables" and "Math with missing values" in EXAMPLES.rst.


Example usage
=============

::

    >>> from stata_dta import open_dta, display_diff
    
    >>> dta1 = open_dta("C:/Program Files (x86)/Stata12/auto.dta")
    (1978 Automobile Data)

    >>> dta2 = open_dta("C:/Program Files (x86)/Stata13/auto.dta")
    (1978 Automobile Data)

    >>> display_diff(dta1, dta2)
        class types differ:
            Dta115 vs Dta117
        formats differ:
            114 vs 117
        time stamps differ:
            13 Apr 2011 17:45 vs 13 Apr 2013 17:45

    >>> dta1.list("make rep weight disp", in_=range(6))
        +--------------------------------------------------+
        | make                   rep78    weight  displa~t |
        +--------------------------------------------------+
     0. | AMC Concord                3     2,930       121 |
     1. | AMC Pacer                  3     3,350       258 |
     2. | AMC Spirit                 .     2,640       121 |
     3. | Buick Century              3     3,250       196 |
     4. | Buick Electra              4     4,080       350 |
        +--------------------------------------------------+
     5. | Buick LeSabre              3     3,670       231 |
        +--------------------------------------------------+

    >>> dta1[:6, ::3].list()
        +--------------------------------------------------+
        | make                   rep78    weight  displa~t |
        +--------------------------------------------------+
     0. | AMC Concord                3     2,930       121 |
     1. | AMC Pacer                  3     3,350       258 |
     2. | AMC Spirit                 .     2,640       121 |
     3. | Buick Century              3     3,250       196 |
     4. | Buick Electra              4     4,080       350 |
        +--------------------------------------------------+
     5. | Buick LeSabre              3     3,670       231 |
        +--------------------------------------------------+

    >>> from stata_dta import Dta115, Dta117
    >>> v = [[0, 0.1, "0.2", 0.3], [1, 1.1, "1.2"], [2], [3, 3.1, 3.2, 3.3]]
    >>> for row in v:
    ...     print(row)
    ...
    [0, 0.1, '0.2', 0.3]
    [1, 1.1, '1.2']
    [2]
    [3, 3.1, 3.2, 3.3]
    
    >>> dta3 = Dta117(v)
    >>> dta2.describe()
    
      obs:             4
     vars:             4                          31 Dec 2013 17:11
     size:            80
    ----------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    ----------------------------------------------------------------------
    var0            byte    %8.0g
    var1            double  %10.0g
    var2            str3    %9s
    var3            double  %10.0g
    ----------------------------------------------------------------------
    Sorted by:
         Note:  dataset has changed since last saved

    >>> dta3.list()
        +---------------------------------------------+
        |     var0        var1       var2        var3 |
        +---------------------------------------------+
     0. |        0         0.1        0.2         0.3 |
     1. |        1         1.1        1.2           . |
     2. |        2           .                      . |
     3. |        3         3.1        3.2         3.3 |
        +---------------------------------------------+
    
    >>> dta3.summ()
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
            var0 |         4         1.5     1.29099          0          3
            var1 |         3     1.43333     1.52753        0.1        3.1
            var2 |         0
            var3 |         2         1.8     2.12132        0.3        3.3
    
    >>> dta3.save("example.dta")

For more examples, see EXAMPLES.md.


Contributors
============
- James Fiedler
- Matthew Koslovsky


Contact
=======
James Fiedler, jrfiedler@gmail.com


License
=======
Copyright (c) 2014, James Fiedler (MIT License)