3 Star 7 Fork 3

轩少 / streaming-edu360

Create your Gitee Account
Explore and code with more than 8 million developers,Free private repositories !:)
Sign up
Clone or Download
contribute
Sync branch
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README.md

大数据实战训练系列丛书--实战案例玩转Spark--StreamingETL项目


Ngxin日志采集到Kafka,Streming消费到hdfs。

  • 1.logstash采集
  • 2.缓冲kafka
  • 3.streaming消费,etl部分grok正则。
  • 4.存hdfs
  • 5.hive外部表指向hdfs路径,按天分区。

参考

1.携程hangout

2.在线正则grok


相关命令

1.打包:sbt clean assembly

2.提交:spark-submit --queue root.bigdata.streaming --class "cn.edu360.streaming.NginxLogToHive" --name "edu360streaming3" --executor-cores 1 --num-executors 20 --master yarn-cluster streaming-edu360-assembly-1.0.jar NginxLogToHive

3.killjob:yarn application -kill applicationId

4.查看zk中offsets信息:kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper zkhost --topic topicName --group groupId

5.手动更新zk中offsets值:set /consumers/groupId/offsets/topicName/分区 值


About

大数据实战训练系列丛书--实战案例玩转Spark--StreamingETL项目 expand collapse
Scala
Apache-2.0
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
Scala
1
https://gitee.com/wangzhixuan/streaming-edu360.git
git@gitee.com:wangzhixuan/streaming-edu360.git
wangzhixuan
streaming-edu360
streaming-edu360
master

Search

E71a60c3 8189591 Df7b7c6b 8189591