Whenever someone asks me “How to get started in data science?”, I usually recommend the book 📕 — Introduction to Statistical Learning by Daniela Witten, Trevor Hastie, Gareth M. James, Robert Tibshirani, to learn the basics of statistics and machine learning models.
And understandably, completing a technical book while practicing it with relevant data and code is a challenge for lot of us.
So, I created a concise version of the book as a course on statistical machine learning in python. In this repo, each chapter of the book has been translated into a jupyter notebook with summary of the key concepts, data & python code to play with.
If you want to quickly understand the book, learn statistical machine learning or/and python for data science, then just clone the repo and get started! :woman_technologist:
Expect to learn following concepts & their implementation in python:
Notebook: Chapter 2: Statistical Learning explains-
Notebook: Chapter 3: Linear Regression explains-
Notebook: Chapter 4: Classification explains-
Notebook: Chapter 5: Resampling Methods explains-
Notebook: Chapter 6: Linear Model Selection and Regularization explains-
Note: Chapter-7,8,9 and 10 will be added soon.
"This book is intended for anyone who is interested in using modern statistical methods for modeling and prediction from data. This group includes scientists, engineers, data analysts, or quants, but also less technical individuals with degrees in non-quantitative fields such as the social sciences or business. We expect that the reader will have had at least one elementary course in statistics."
I recommend ✅ this book because-
This book (and derived notebooks in this repo) marries the statistical machine learning concepts with real-life data science problem statements. Each chapter/concept begins with a real scenerio, like - "You are a consultant who needs to advice the best medium of advertising & budgets to increase the sale of a product, using the advertising data" and explains techniques and methods step by step as we solve through it.
It gives a modest introduction to statistics and mathematics behind the most used methods like:
Few important concepts it does not touch at all are-
This is the independent part of my blog series, Data science for analytical minds, serving as a resource for people, especially from non-technical backgrounds like economics, statistics, mathematics, physics etc, to learn different components of data science through real life problem statements.
Checkout its 👉 introductory blog & data quality & cleaning blog. This is the 3rd part of the series focusing on statistics & machine learning basics.
This is meant to give you quick head start with most used statistical concepts with data and code to play with. For a deeper understanding of any concept, I recommend referring back to the book.
If you find any issues or have doubts, feel free to submit issues.
If you have any generic feedback, ideas to collaborate or anything interesting to say, you can reach me at shilpaarora992[at]gmail[dot]com.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。