# shifu
**Repository Path**: mirrors/shifu
## Basic Information
- **Project Name**: shifu
- **Description**: Shifu 是一个基于 Hadoop 的快速和可伸缩的机器学习框架
- **Primary Language**: Java
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: https://www.oschina.net/p/shifu
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 2
- **Created**: 2020-11-22
- **Last Updated**: 2026-01-03
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
[
](http://shifu.ml)
[](https://travis-ci.org/ShifuML/shifu?branch=develop)
[](https://maven-badges.herokuapp.com/maven-central/ml.shifu/shifu)
#
## Download
Please [download](https://github.com/ShifuML/shifu/wiki/shifu-0.12.0-hdp-yarn.tar.gz) latest shifu [here](https://github.com/ShifuML/shifu/wiki/shifu-0.12.0-hdp-yarn.tar.gz).
## Getting Started
After shifu downloading, build your first model with Shifu [tutorial](https://github.com/ShifuML/shifu/wiki/Tutorial---Build-Your-First-ML-Model). More details about shifu can be found in our [wiki pages](https://github.com/ShifuML/shifu/wiki).
## What is Shifu?
Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. Shifu is designed for data scientists, simplifying the life-cycle of building machine learning models. While originally built for fraud modeling, Shifu is generalized for many other modeling domains.
One of Shifu's pros is an end-to-end modeling pipeline in machine learning. With only configurations settings, a whole machine pipeline can be built and model can be much more easy to develop and push to production. The pipeline defined in Shifu is in below:

Shifu provides a simple command-line interface for each step of the model building process, including
* Statistic calculation & variable selection to determine the most predictive variables in your data
* [Variable normalization](https://github.com/ShifuML/shifu/wiki/Variable%20Transform%20in%20Shifu)
* [Distributed variable selection based on sensitivity analysis](https://github.com/ShifuML/shifu/wiki/Variable%20Selection%20in%20Shifu)
* [Distributed neural network model training](https://github.com/ShifuML/shifu/wiki/Distributed%20Neural%20Network%20Training%20in%20Shifu)
* [Distributed tree ensemble model training](https://github.com/ShifuML/shifu/wiki/Distributed%20Tree%20Ensemble%20Model%20Training%20in%20Shifu)
* Post training analysis & model evaluation
* [Distributed Tensorflow on Shifu](https://github.com/ShifuML/shifu/wiki/Distributed-Tensorflow-Support-On-Shifu)
Shifu’s fast Hadoop-based, distributed neural network / logistic regression / gradient boosted trees training can reduce model training time from days to hours on TB data sets. Shifu integrates with Pig workflows on Hadoop, and Shifu-trained models can be integrated into production code with a simple Java API. Shifu leverages Pig, Akka, Encog and other open source projects.
[Guagua](https://github.com/ShifuML/guagua), an in-memory iterative computing framework on Hadoop YARN is developed as sub-project of Shifu to accelerate training progress.
More details about shifu can be found in our [wiki pages](https://github.com/ShifuML/shifu/wiki)
## Conference
* [QCON Shanghai 2015](http://2015.qconshanghai.com/presentation/2827) [Slides](http://www.slideshare.net/pengshanzhang/large-scale-machine-learning-at-pay-pal-risk)
* [BDTC Beijing 2016](http://bdtc2016.hadooper.cn/dct/page/70107)
* [Strata Beijing 2017](https://strata.oreilly.com.cn/strata-cn/public/schedule/detail/59593?locale=en)
## Contributors
- Zhanghao Hu (zhanhu@paypal.com)
- Grahame Jastrebski (gjastrebski@paypal.com)
- Lavar Li (lulli@paypal.com)
- Mark Liu (yliu15@paypal.com)
- David Zhang (pengzhang@paypal.com)
- Xin Zhong (xinzhong@paypal.com)
- Simon Zhang (jzhang13@paypal.com)
- Sharma Nitin (nsharma1@paypal.com)
- Wayne Zhu (wzhu1@paypal.com)
- Devin Wu (haifwu@paypal.com)
- Fred Bai (webai@paypal.com)
## Google Group
Please join [Shifu group](https://groups.google.com/forum/#!forum/shifuml) if questions, bugs or anything else.
## Copyright and License
Copyright 2012-2019, PayPal Software Foundation under the [Apache License](LICENSE.txt).