3 Star 2 Fork 0

Gitee 极速下载 / fregata

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
此仓库是为了提升国内下载速度的镜像仓库,每日同步一次。 原始仓库: https://github.com/TalkingData/Fregata
克隆/下载
largescale_lr.md 2.01 KB
一键复制 编辑 原始数据 按行查看 历史
takun 提交于 2017-08-02 16:16 . update docs

LR for trillion dimensions

Introduction

Some times we have very high dimension data .But it is intractable to train the model. So we develop a LR and Softmax for trillion dimensions feature .

Example

The following example demonstrate how the usage of LR . The argument binSize=32 to control the sparse ratio of result model .


  import fregata.SparseVector
  import fregata.spark.data.LibSvmReader
  import fregata.spark.metrics.classification.{Accuracy, AreaUnderRoc}
  import fregata.spark.model.largescale.LogisticRegression
  import org.apache.spark.{SparkConf, SparkContext}
  
  /**
   * Created by takun on 16/9/20.
   */
  object TestLargeScaleLogisticRegression {
  
    def main(args: Array[String]): Unit = {
      val conf = new SparkConf().setAppName("logistic regression")
      val sc = new SparkContext(conf)
      val (_,trainData) = LibSvmReader.read(sc,"/Volumes/takun/data/libsvm/a9a",123)
      val (_,testData) = LibSvmReader.read(sc,"/Volumes/takun/data/libsvm/a9a.t",123)
      val model = LogisticRegression.run(trainData.map{
        case (x,label) =>
          val sx = x.asInstanceOf[SparseVector]
          (sx.index.map( _.toLong ) , sx.data,label)
      },binSize = 32)
      val pd = model.predict(testData.map{
        case (x,label) =>
          val sx = x.asInstanceOf[SparseVector]
          (sx.index.map( _.toLong ) , sx.data,label)
      })
      val acc = Accuracy.of( pd.map{
        case ((x,v,l),(p,c)) =>
          c -> l
      })
      val auc = AreaUnderRoc.of( pd.map{
        case ((x,v,l),(p,c)) =>
          p -> l
      })
      val loss = fregata.spark.loss.log(pd.map{
        case ((x,v,l),(p,c)) =>
          if( l == 1d ) {
            (l,c,p)
          }else{
            ( l , c , 1-p )
          }
      })
      println( s"AreaUnderRoc = $auc ")
      println( s"Accuracy = $acc ")
      println( s"LogLoss = $loss ")
    }
  }
AreaUnderRoc = 0.8993762522225006 
Accuracy = 0.8485444048642673 
LogLoss = 0.328876130227733 
1
https://gitee.com/mirrors/fregata.git
git@gitee.com:mirrors/fregata.git
mirrors
fregata
fregata
master

搜索帮助