3 Star 18 Fork 11

汪静怡 / joyful-pandas

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
AFL-3.0

Joyful-Pandas

【本教程与Pandas官方最新发行版本保持同步,当前版本:v-1.0.3

一、写作初衷

在使用Pandas之前,几乎所有的大型表格处理问题都是用xlrd/xlwt和python循环实现,虽然这已经几乎能完成一切的需求,但其缺点也显而易见,其一就是速度问题,其二就是代码的复用性几乎为0。

曾经也尝试过去零星地学Pandas,但不得不说这个包实在太过庞大,每次使用总觉得盲人摸象,每个函数的参数也很多,学习的路线并不是十分平缓。如果你刚刚手上使用Pandas,那么在碎片的学习过程中,报错是常常发生的事,并且很难修(因为不理解内部的操作),即使修好了下次又不会,令人有些沮丧。

2019秋季,我偶然找到了一本完全关于Pandas的书,Theodore Petrou所著的Pandas Cookbook,现在可在网上下到pdf,不过现在还没有中文版。寒假开始后,立即快速地过了一遍,发现之前很多搞不清的概念得到了较好的解答,逐步地再对着User Guide一字一句查看,最后总是建立了大的一些宏观概念。

最关键的一步,我想是通读了官方User Guide的绝大部分内容,这可能是非常重要的一个台阶,毕竟官方的教程总是会告诉你重点在哪里。因此,经过了一段时间的思考,结合了Wes Mckinney(Pandas之父)的Python for Data Analysis、先前提到的Pandas Cookbook和官方的User Guide,由此想按照自己的思路编一套关于Pandas的教程,完整梳理Pandas的主线内容,杜绝浅尝辄止,保证涉及每个部分的核心概念和函数。最后,希望达到的境界自然是“所写所得即所想”,这大概需要更多的实践,也是努力实现的目标方向。

关于项目的名字,我想原先使用Pandas时非常的痛苦(Painful),那现在是时候转变为“Joyful-Pandas”了!

二、编排思路

本项目共有十章,可以大致分为4个板块:Pandas基础、四类操作、四类数据、例子。

1、拿到数据必然先要读取它,分析完了数据必然是要保存它,读取了数据之后,我们面对了怎样的对象(Series? or Dataframe?)是第一重要的课题,因此了解序列和数据框的常规操作及其组件(component)便是必须涉及的内容。

2、对于一个DataFrame而言,如果一个操作使得它的元素信息减少了,那就对应了索引,即第二章的内容;如果这个操作使得数据的信息被充分地使用,那有两种可能,第一是数据被分组,从组内提取了关键的信息,第二就是数据呈现的结构或形态上的变化,使得我们更容易地能够地进一步处理数据,这两者分别对应了第三章与第四章的内容;最终如果一个操作使得原本不属于这个数据框的信息被加入了进来,那往往是涉及到了合并操作,对应了第五章的内容。从数据信息增减的角度,拆解成了3个板块,4个章节,几乎串联其了官方文档关于数据框操作的全部内容,我想这样的安排是合适的。

3、如果说前面我们关心了序列和数据框这两种容器的结构和操作,那么下面就要关心其中的元素。其中,将涉及四类特殊的数据类型:缺失型数据、文本型数据、分类型数据和时间序列型数据,分别对应了6-9章的内容,并且在缺失型数据和文本型数据中,将详细涉及Pandas1.0版本新的Nullable和string数据类型,这也是从上一个版本0.25.3升级后具有最大改动的方面。

4、正如前面所说,Pandas的学习往往是任务驱动型,一个操作或者某个方法,不去使用自然会很快地忘记(除非你天赋异禀!),因此我前九章都会添加“问题和练习”的部分。其中,问题中出现的往往是对于教程中某个细节的深入与补充,或者是关于这一章函数方法的实践理解,希望你能够查阅相关资料阅读以解决问题;而练习部分包含了两个综合题(两个的不同案例),相当于对前面所学的综合运用,虽不是非常复杂,但是想要全完成,还是需要花一些功夫。最终,在第10章中会添加若干难度不一的综合问题(不定期更新)。

基于完整性,我为所有的练习写了参考答案,当然它不一定是优秀的解析,但是不失为一种提示与策略。

最后,祝你有所收获!

三、内容导航

章节 小节 内容
第1章 Pandas基础 一、文件读取与写入 1. 读取
2. 写入
二、基本数据结构 1. Series
2. DataFrame
三、常用基本函数 1. head和tail
2. unique和nunique
3. count和value_counts
4. describe和info
5. idxmax和nlargest
6. clip和replace
7. apply函数
四、排序 1. 索引排序
2. 值排序
五、问题与练习 1. 问题
2. 练习
第2章 索引 一、单级索引 1. loc方法、iloc方法、[]操作符
2. 布尔索引
3. 快速标量索引
4. 区间索引
二、多级索引 1. 创建多级索引
2. 多层索引切片
3. 多层索引中的slice对象
4. 索引层的交换
三、索引设定 1. index_col参数
2. reindex和reindex_like
3. set_index和reset_index
4. rename_axis和rename
四、常用索引型函数 1. where函数
2. mask函数
3. query函数
五、重复元素处理 1. duplicated方法
2. drop_duplicates方法
六、抽样函数 抽样函数
七、问题与练习 1. 问题
2. 练习
第3章 分组 一、SAC过程 1. 内涵
2. apply过程
二、groupby函数 1. 分组函数的基本内容
2. groupby对象的特点
三、聚合、过滤和变换 1. 聚合(Aggregation)
2. 过滤(Filteration)
3. 变换(Transformation)
四、apply函数 1. apply函数的灵活性
2. 用apply同时统计多个指标
五、问题与练习 1. 问题
2. 练习
第4章 变形 一、透视表 1. pivot
2. pivot_table
3. crosstab(交叉表)
二、其他变形方法 1. melt
2. 压缩与展开
三、哑变量与因子化 1. Dummy Variable(哑变量)
2. factorize方法
四、问题与练习 1. 问题
2. 练习
第5章 合并 一、append与assign 1. append方法
2. assign方法
二、combine与update 1. comine方法
2. update方法
三、concat方法 concat方法
四、merge与join 1. merge函数
2. join函数
五、问题与练习 1. 问题
2. 练习
第6章 缺失数据 一、缺失观测及其类型 1. 了解缺失信息
2. 三种缺失符号
3. Nullable类型与NA符号
4. NA的特性
5. convert_dtypes方法
二、缺失数据的运算与分组 1. 加号与乘号规则
2. groupby方法中的缺失值
三、填充与剔除 1. fillna方法
2. dropna方法
四、插值(interpolation) 1. 线性插值
2. 高级插值方法
3. interpolate中的限制参数
五、问题与练习 1. 问题
2. 练习
第7章 文本数据 一、string类型的性质 1. string与object的区别
2. string类型的转换
二、拆分与拼接 1. str.split方法
2. str.cat方法
三、替换 1. str.replace的常见用法
2. 子组与函数替换
3. 关于str.replace的注意事项
四、子串匹配与提取 1. str.extract方法
2. str.extractall方法
3. str.contains和str.match
五、常用字符串方法 1. 过滤型方法
2. isnumeric方法
六、问题与练习 1. 问题
2. 练习
第8章 分类数据 一、category的创建及其性质 1. 分类变量的创建
2. 分类变量的结构
3. 类别的修改
二、分类变量的排序 1. 序的建立
2. 排序
三、分类变量的比较操作 1. 与标量或等长序列的比较
2. 与另一分类变量的比较
四、问题与练习 1. 问题
2. 练习
第9章 时序数据 一、时序的创建 1. 四类时间变量
2. 时间点的创建
3. DateOffset对象
二、时序的索引及属性 1. 索引切片
2. 子集索引
3. 时间点的属性
三、重采样 1. resample对象的基本操作
2. 采样聚合
3. 采样组的迭代
四、窗口函数 1. Rolling
2. Expanding
五、问题与练习 1. 问题
2. 练习
第10章 不定期更新的例子 一、评委打分 方法一
方法二
方法三
二、企业收入熵指数 参考答案
... ...
参考答案 第1章 练习一
练习二
第2章 练习一
练习二
第3章 练习一
练习二
第4章 练习一
练习二
第5章 练习一
练习二
第6章 练习一
练习二
第7章 练习一
练习二
第8章 练习一
练习二
第9章 练习一
练习二

四、使用的版本

python: 3.7
numpy: 1.18.1
pandas: 1.0.3
matplotlib: 3.1.3
scipy:1.4.1
xlrd:1.2.0
openpyxl:3.0.3

五、反馈

1、欢迎任何有益的建议或想法,可邮件(1801214626@qq.com)交流!

2、不免有错误,欢迎提Issues!

六、参考资料

1、Python for Data Analysis Wes McKinney著

2、Pandas Cookbook Theodore Petrou著

3、User Guide Pandas开发团队编写

七、推广

Academic Free License (“AFL”) v. 3.0 This Academic Free License (the "License") applies to any original work of authorship (the "Original Work") whose owner (the "Licensor") has placed the following licensing notice adjacent to the copyright notice for the Original Work: Licensed under the Academic Free License version 3.0 1) Grant of Copyright License. Licensor grants You a worldwide, royalty-free, non-exclusive, sublicensable license, for the duration of the copyright, to do the following: a) to reproduce the Original Work in copies, either alone or as part of a collective work; b) to translate, adapt, alter, transform, modify, or arrange the Original Work, thereby creating derivative works ("Derivative Works") based upon the Original Work; c) to distribute or communicate copies of the Original Work and Derivative Works to the public, under any license of your choice that does not contradict the terms and conditions, including Licensor’s reserved rights and remedies, in this Academic Free License; d) to perform the Original Work publicly; and e) to display the Original Work publicly. 2) Grant of Patent License. Licensor grants You a worldwide, royalty-free, non-exclusive, sublicensable license, under patent claims owned or controlled by the Licensor that are embodied in the Original Work as furnished by the Licensor, for the duration of the patents, to make, use, sell, offer for sale, have made, and import the Original Work and Derivative Works. 3) Grant of Source Code License. The term "Source Code" means the preferred form of the Original Work for making modifications to it and all available documentation describing how to modify the Original Work. Licensor agrees to provide a machine-readable copy of the Source Code of the Original Work along with each copy of the Original Work that Licensor distributes. Licensor reserves the right to satisfy this obligation by placing a machine-readable copy of the Source Code in an information repository reasonably calculated to permit inexpensive and convenient access by You for as long as Licensor continues to distribute the Original Work. 4) Exclusions From License Grant. Neither the names of Licensor, nor the names of any contributors to the Original Work, nor any of their trademarks or service marks, may be used to endorse or promote products derived from this Original Work without express prior permission of the Licensor. Except as expressly stated herein, nothing in this License grants any license to Licensor’s trademarks, copyrights, patents, trade secrets or any other intellectual property. No patent license is granted to make, use, sell, offer for sale, have made, or import embodiments of any patent claims other than the licensed claims defined in Section 2. No license is granted to the trademarks of Licensor even if such marks are included in the Original Work. Nothing in this License shall be interpreted to prohibit Licensor from licensing under terms different from this License any Original Work that Licensor otherwise would have a right to license. 5) External Deployment. The term "External Deployment" means the use, distribution, or communication of the Original Work or Derivative Works in any way such that the Original Work or Derivative Works may be used by anyone other than You, whether those works are distributed or communicated to those persons or made available as an application intended for use over a network. As an express condition for the grants of license hereunder, You must treat any External Deployment by You of the Original Work or a Derivative Work as a distribution under section 1(c). 6) Attribution Rights. You must retain, in the Source Code of any Derivative Works that You create, all copyright, patent, or trademark notices from the Source Code of the Original Work, as well as any notices of licensing and any descriptive text identified therein as an "Attribution Notice." You must cause the Source Code for any Derivative Works that You create to carry a prominent Attribution Notice reasonably calculated to inform recipients that You have modified the Original Work. 7) Warranty of Provenance and Disclaimer of Warranty. Licensor warrants that the copyright in and to the Original Work and the patent rights granted herein by Licensor are owned by the Licensor or are sublicensed to You under the terms of this License with the permission of the contributor(s) of those copyrights and patent rights. Except as expressly stated in the immediately preceding sentence, the Original Work is provided under this License on an "AS IS" BASIS and WITHOUT WARRANTY, either express or implied, including, without limitation, the warranties of non-infringement, merchantability or fitness for a particular purpose. THE ENTIRE RISK AS TO THE QUALITY OF THE ORIGINAL WORK IS WITH YOU. This DISCLAIMER OF WARRANTY constitutes an essential part of this License. No license to the Original Work is granted by this License except under this disclaimer. 8) Limitation of Liability. Under no circumstances and under no legal theory, whether in tort (including negligence), contract, or otherwise, shall the Licensor be liable to anyone for any indirect, special, incidental, or consequential damages of any character arising as a result of this License or the use of the Original Work including, without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses. This limitation of liability shall not apply to the extent applicable law prohibits such limitation. 9) Acceptance and Termination. If, at any time, You expressly assented to this License, that assent indicates your clear and irrevocable acceptance of this License and all of its terms and conditions. If You distribute or communicate copies of the Original Work or a Derivative Work, You must make a reasonable effort under the circumstances to obtain the express assent of recipients to the terms of this License. This License conditions your rights to undertake the activities listed in Section 1, including your right to create Derivative Works based upon the Original Work, and doing so without honoring these terms and conditions is prohibited by copyright law and international treaty. Nothing in this License is intended to affect copyright exceptions and limitations (including “fair use” or “fair dealing”). This License shall terminate immediately and You may no longer exercise any of the rights granted to You by this License upon your failure to honor the conditions in Section 1(c). 10) Termination for Patent Action. This License shall terminate automatically and You may no longer exercise any of the rights granted to You by this License as of the date You commence an action, including a cross-claim or counterclaim, against Licensor or any licensee alleging that the Original Work infringes a patent. This termination provision shall not apply for an action alleging patent infringement by combinations of the Original Work with other software or hardware. 11) Jurisdiction, Venue and Governing Law. Any action or suit relating to this License may be brought only in the courts of a jurisdiction wherein the Licensor resides or in which Licensor conducts its primary business, and under the laws of that jurisdiction excluding its conflict-of-law provisions. The application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. Any use of the Original Work outside the scope of this License or after its termination shall be subject to the requirements and penalties of copyright or patent law in the appropriate jurisdiction. This section shall survive the termination of this License. 12) Attorneys’ Fees. In any action to enforce the terms of this License or seeking damages relating thereto, the prevailing party shall be entitled to recover its costs and expenses, including, without limitation, reasonable attorneys' fees and costs incurred in connection with such action, including any appeal of such action. This section shall survive the termination of this License. 13) Miscellaneous. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. 14) Definition of "You" in This License. "You" throughout this License, whether in upper or lower case, means an individual or a legal entity exercising rights under, and complying with all of the terms of, this License. For legal entities, "You" includes any entity that controls, is controlled by, or is under common control with you. For purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. 15) Right to Use. You may use the Original Work in all ways not otherwise restricted or conditioned by this License or by law, and Licensor promises not to interfere with or be responsible for such uses by You. 16) Modification of This License. This License is Copyright © 2005 Lawrence Rosen. Permission is granted to copy, distribute, or communicate this License without modification. Nothing in this License permits You to modify this License as applied to the Original Work or to Derivative Works. However, You may modify the text of this License and copy, distribute or communicate your modified version (the "Modified License") and apply it to other original works of authorship subject to the following conditions: (i) You may not indicate in any way that your Modified License is the "Academic Free License" or "AFL" and you may not use those names in the name of your Modified License; (ii) You must replace the notice specified in the first paragraph above with the notice "Licensed under <insert your license name here>" or with a notice of your own that is not confusingly similar to the notice in this License; and (iii) You may not claim that your original works are open source software unless your Modified License has been approved by Open Source Initiative (OSI) and You comply with its license review and certification process.

简介

Pandas中文教程 展开 收起
AFL-3.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/cocowjy1126/joyful-pandas.git
git@gitee.com:cocowjy1126/joyful-pandas.git
cocowjy1126
joyful-pandas
joyful-pandas
master

搜索帮助