# cheatsheet_Python_Stata_Mtalab **Repository Path**: JerryR/cheatsheet_Python_Stata_Mtalab ## Basic Information - **Project Name**: cheatsheet_Python_Stata_Mtalab - **Description**: Python、Stata、Mtalab小抄汇总 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2021-01-05 - **Last Updated**: 2021-01-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # cheatsheet_Python_Stata_Mtalab #### 介绍 Python、Stata、Mtalab小抄汇总 # Python、Stata、Mtalab小抄汇总 来源:https://cheatsheets.quantecon.org/ 由计量经济学服务中心综合整理,转载请注明来源 本文由计量经济学服务中心由Markdown编辑整理,欢迎进入计量经济学仓库进行学习 [TOC] # Statistics cheatsheet[¶](https://cheatsheets.quantecon.org/stats-cheatsheet.html#statistics-cheatsheet) - - [Basics](https://cheatsheets.quantecon.org/stats-cheatsheet.html#basics) - [Filtering data](https://cheatsheets.quantecon.org/stats-cheatsheet.html#filtering-data) - [Summarizing data](https://cheatsheets.quantecon.org/stats-cheatsheet.html#summarizing-data) - [Reshaping data](https://cheatsheets.quantecon.org/stats-cheatsheet.html#reshaping-data) - [Merging data](https://cheatsheets.quantecon.org/stats-cheatsheet.html#merging-data) - [Plotting](https://cheatsheets.quantecon.org/stats-cheatsheet.html#plotting) In the Python code `import pandas as pd` has been run ## Basics[¶](https://cheatsheets.quantecon.org/stats-cheatsheet.html#basics) | | STATA | PANDAS | BASE R | | :------------------------------: | :-----------------------------------------: | :--------------------------------------------------------: | ----------------------------------------- | | Create new dataset from values | `input a b 1 4 2 5 3 6 end ` | `d = {'a' : [1,2,3], 'b' : [4,5,6]} df = pd.DataFrame(d) ` | `df <- data.frame(a=1:3, b=4:6) ` | | Create new dataset from csv file | `import delim mydata.csv, delimiters(",") ` | `df = pd.read_csv('mydata.csv', sep=',') ` | `df <- read.csv('my_data.csv', sep=',') ` | | Print observations | `list ` | `df ` | `df ` | | Print observations of variable x | `list x ` | `df['x'] ` | `df$x ` | | Select only variable x | `keep x ` | `df = df['x'] ` | `df <- df$x ` | | Select only variables x and y | `keep x y ` | `df = df[['x', 'y']] ` | `df <- df[c(‘x’, ‘y’)] ` | | Drop variable x | `drop x ` | `df = df.drop('x', axis=1) ` | `df$x <- NULL ` | | Generate new variable | `gen z = x + y ` | `df['z'] = df['x'] + df['y'] ` | `df$z <- df$x + df$y ` | | Rename variable | `rename x y ` | `df.rename(columns = {'x' : 'y'}) ` | `names(df)[names(df) == ‘x’] <- ‘y’ ` | | Sort by variable | `sort x ` | `df.sort_values('x') ` | `df[order(df$x), ] ` | ## Filtering data[¶](https://cheatsheets.quantecon.org/stats-cheatsheet.html#filtering-data) | | STATA | PANDAS | BASE R | | :--------------------------------------------------: | :----------------------: | :----------------------------------: | ----------------------------- | | Conditionally print observations | `list if x > 1 ` | `df[df['x'] > 1] ` | `subset(df, x == 1) ` | | Conditionally print observations with ‘or’ operator | `list if x > 1 | y < 0 ` | `df[(df['x'] > 1) | (df['y'] < 0)] ` | `subset(df, x == 1 | y < 0) ` | | Conditionally print observations with ‘and’ operator | `list if x < 1 & y > 5 ` | `df[(df['x'] > 1) & (df['y'] < 0)] ` | `subset(df, x == 1 & y < 0) ` | | Print subset of observations based on location | `list in 1/3 ` | `df[0:3] ` | `df[1:3, ] ` | | Print observations with missing values in x | `list if missing(x) ` | `df[df['x'].isnull()] ` | `subset(df, is.na(x)) ` | ## Summarizing data[¶](https://cheatsheets.quantecon.org/stats-cheatsheet.html#summarizing-data) | | STATA | PANDAS | BASE R | | :---------------------------------------------------: | :----------------------------: | :--------------------------------: | -------------------------------- | | Print summary statistics | `summarize ` | `df.describe() ` | `summary(df) ` | | Print information about variables and data types | `describe ` | `df.info() ` | `str(df) ` | | Print aggregation of variable | `mean x ` | `df['x'].mean() ` | `mean(df$x) ` | | Group data by variable and summarize | `bysort x: summarize ` | `df.groupby('x').describe() ` | `aggregate(. ~ x, df, summary) ` | | Print frequency table | `tab x ` | `df['x'].value_counts() ` | `table(df$x) ` | | Print cross-tabulation | `tab x y ` | `pd.crosstab(df['x'], df['y']) ` | `table(df$x, df$y) ` | | Create bins based on values in x in new column ‘bins’ | `egen bins = cut x, group(3) ` | `df['bins'] = pd.cut(df['x'], 3) ` | `df$bins <- cut(df$x, 3) ` | ## Reshaping data[¶](https://cheatsheets.quantecon.org/stats-cheatsheet.html#reshaping-data) | | STATA | PANDAS | BASE R | | :----------------------------------: | :--------------------------: | :--------------------------------------------: | ------------------------------------------------------------ | | Reshape data from wide to long panel | `reshape long x, i(i) j(j) ` | `pd.wide_to_long(df, ['x'], i='i', j='j') ` | `reshape(df, direction='long', varying=grep('j', names(df), value=TRUE), sep='') ` | | Reshape data from long to wide panel | `reshape wide ` | `df.unstack() # returns hierarchical columns ` | `reshape(df, timevar='x', idvar='i', direction='wide') ` | ## Merging data[¶](https://cheatsheets.quantecon.org/stats-cheatsheet.html#merging-data) | STATA | PANDAS | BASE R | | | :-----------------------------: | :----------------------: | :--------------------------------------: | ------------------------------------------------------------ | | Vertically concatenate datasets | `append using y ` | `pd.concat([x, y]) ` | `rbind(x, y) # note that columns must be the same for each dataset ` | | Merge datasets on key | `merge 1:1 key using y ` | `pd.merge(x, y, on='key', how='inner') ` | `merge(x, y, by='key') ` | ## Plotting[¶](https://cheatsheets.quantecon.org/stats-cheatsheet.html#plotting) | STATA | PANDAS | BASE R | | | :----------: | :------------: | :--------------------------: | -------------------- | | Scatter plot | `plot x y ` | `df.plot.scatter('x', 'y') ` | `plot(df$x, df$y) ` | | Line plot | `line x y ` | `df.plot('x', 'y') ` | `lines(df$x, df$y) ` | | Histogram | `hist x ` | `df.hist('x') ` | `hist(df$x) ` | | Boxplot | `graph box x ` | `df.boxplot('x') ` | `boxplot(df$x)` | ------ # Python cheatsheet[¶](https://cheatsheets.quantecon.org/python-cheatsheet.html#python-cheatsheet) - - [Operators](https://cheatsheets.quantecon.org/python-cheatsheet.html#operators) - [Data Types](https://cheatsheets.quantecon.org/python-cheatsheet.html#data-types) - [Built-In Functions](https://cheatsheets.quantecon.org/python-cheatsheet.html#built-in-functions) - [Iterating](https://cheatsheets.quantecon.org/python-cheatsheet.html#iterating) - [Comparisons and Logical Operators](https://cheatsheets.quantecon.org/python-cheatsheet.html#comparisons-and-logical-operators) - [User-Defined Functions](https://cheatsheets.quantecon.org/python-cheatsheet.html#user-defined-functions) - [Numpy](https://cheatsheets.quantecon.org/python-cheatsheet.html#numpy) - [numpy.linalg](https://cheatsheets.quantecon.org/python-cheatsheet.html#numpy-linalg) - [Pandas](https://cheatsheets.quantecon.org/python-cheatsheet.html#pandas) - [Plotting](https://cheatsheets.quantecon.org/python-cheatsheet.html#plotting) ## Operators[¶](https://cheatsheets.quantecon.org/python-cheatsheet.html#operators) | Command | Description | | :------ | :----------------------------------------------------------- | | `*` | multiplication operation: `2*3` returns `6` | | `**` | power operation: `2**3` returns `8` | | `@` | matrix multiplication:`import numpy as np A = np.array([[1, 2, 3]]) B = np.array([[3], [2], [1]]) A @ B `returns`array([[10]]) ` | ## Data Types[¶](https://cheatsheets.quantecon.org/python-cheatsheet.html#data-types) | Command | Description | | :------------------- | :----------------------------------------------------------- | | `l = [a1, a2,…, an]` | Constructs a list containing the objects a1,a2,...,ana1,a2,...,an. You can append to the list using `l.append()`. The ithith element of ll can be accessed using `l[i]` | | `t =(a1, a2,…, an)` | Constructs a tuple containing the objects a1,a2,...,ana1,a2,...,an. The ithith element of tt can be accessed using `t[i]` | ## Built-In Functions[¶](https://cheatsheets.quantecon.org/python-cheatsheet.html#built-in-functions) | Command | Description | | :-------------- | :----------------------------------------------------------- | | `len(iterable)` | `len` is a function that takes an iterable, such as a list, tuple or numpy array and returns the number of items in that object. For a numpy array, `len` returns the length of the outermost dimension`len(np.zeros((5, 4))) `returns `5`. | | `zip` | Make an iterator that aggregates elements from each of the iterables.`x = [1, 2, 3] y = [4, 5, 6] zipped = zip(x, y) list(zipped) `returns `[(1, 4), (2, 5), (3, 6)]` | ## Iterating[¶](https://cheatsheets.quantecon.org/python-cheatsheet.html#iterating) | Command | Description | | :------------------- | :----------------------------------------------------------- | | `for a in iterable:` | For loop used to perform a sequence of commands (denoted using tabs) for each element in an iterable object such as a list, tuple, or numpy array. An example code is`l = [] for i in [1, 2, 3]: l.append(i**2) print(l) `prints `[1, 4, 9]` | ## Comparisons and Logical Operators[¶](https://cheatsheets.quantecon.org/python-cheatsheet.html#comparisons-and-logical-operators) | Command | Description | | :-------------- | :----------------------------------------------------------- | | `if condition:` | Performs code if a condition is met (using tabs). For example`if x == 5: x = x**2 else: x = x**3 `squares xx if xx is 55, otherwise cubes it. | ## User-Defined Functions[¶](https://cheatsheets.quantecon.org/python-cheatsheet.html#user-defined-functions) | Command | Description | | :------- | :----------------------------------------------------------- | | `lambda` | Used for create anonymous one line functions of the form:`f = lambda x, y: 5*x+y `The code after the lambda but before variables specifies the parameters. The code after the colon tells python what object to return. | | `def` | The def command is used to create functions of more than one line:`def g(x, y): """ Docstring """ ret = sin(x) return ret + y `The code immediately following `def` names the function, in this example `g` . The variables in the parenthesis are the parameters of the function. The remaining lines of the function are denoted by tab indents. The return statement specifies the object to be returned. | ## Numpy[¶](https://cheatsheets.quantecon.org/python-cheatsheet.html#numpy) | Command | Description | | :------------------------------- | :----------------------------------------------------------- | | `np.array(object, dtype = None)` | `np.array` constructs a numpy array from an object, such as a list or a list of lists. `dtype`allows you to specify the type of object the array is holding. You will generally note need to specify the `dtype`. Examples:`np.array([1, 2, 3]) #creates 1 dim array of ints np.array( [1, 2, 3.0] )#creates 1 dim array of floats np.array( [ [1, 2], [3, 4] ]) #creates a 2 dim array ` | | `A[i1, i2,…, in]` | Access a the element in numpy array A in with index i1 in dimension 1, i2 in dimension 2, etc. Can use `:` to access a range of indices, where `imin:imax` represents all ii such that imin≤i x^2 # can be rebound ` | | Function | `function out = f(x) out = x^2 end ` | `def f(x): return x**2 ` | `function f(x) return x^2 end f(x) = x^2 # not anon! ` | | Tuples | `t = {1 2.0 "test"} t{1} `Can use cells but watch performance | `t = (1, 2.0, "test") t[0] ` | `t = (1, 2.0, "test") t[1] ` | | Named Tuples/ Anonymous Structures | `m.x = 1 m.y = 2 m.x ` | `from collections import namedtuple mdef = namedtuple('m', 'x y') m = mdef(1, 2) m.x ` | `# vanilla m = (x = 1, y = 2) m.x # constructor using Parameters mdef = @with_kw (x=1, y=2) m = mdef() # same as above m = mdef(x = 3) ` | | Closures | `a = 2.0 f = @(x) a + x f(1.0) ` | `a = 2.0 def f(x): return a + x f(1.0) ` | `a = 2.0 f(x) = a + x f(1.0) ` | | Inplace Modification | `function f(out, x) out = x.^2 end x = rand(10) y = zeros(length(x), 1) f(y, x) ` | `def f(x): x **=2 return x = np.random.rand(10) f(x) ` | `function f!(out, x) out .= x.^2 end x = rand(10) y = similar(x) f!(y, x)` |