# apache-spark-tutorial **Repository Path**: waylau/apache-spark-tutorial ## Basic Information - **Project Name**: apache-spark-tutorial - **Description**: Apache Spark Tutorial.《跟老卫学Apache Spark》 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 3 - **Created**: 2021-07-12 - **Last Updated**: 2025-03-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Apache Spark Tutorial.《跟老卫学Apache Spark开发》《循序渐进Spark大数据应用开发》源码 ![](images/spark-logo-trademark.png) *Apache Spark Tutorial*, is a book about how to develop Apache Spark applications. 《跟老卫学Apache Spark开发》是一本 Apache Spark 应用开发的开源学习教程,主要介绍如何从0开始开发 Apache Spark 应用。本书包括最新版本 Apache Spark 3.x 中的新特性。图文并茂,并通过大量实例带你走近 Apache Spark 的世界! 本书业余时间所著,水平有限、时间紧张,难免疏漏,欢迎指正, ## Summary 目录 * [Spark下载、安装](https://developer.huawei.com/consumer/cn/forum/topic/0202568822299090741?fid=23) * [Spark应用初探](https://developer.huawei.com/consumer/cn/forum/topic/0201568823403320732?fid=23) * [Spark累加器LongAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622461925310080?fid=23) * [Spark累加器DoubleAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622590853530085?fid=23) * [Spark累加器CollectionAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622591182960086?fid=23) * [启动Spark应用的方式](https://developer.huawei.com/consumer/cn/forum/topic/0202623507783170122?fid=23) * [Spark广播变量](https://developer.huawei.com/consumer/cn/forum/topic/0202624224916630149?fid=23) * [Spark RDD入门](https://developer.huawei.com/consumer/cn/forum/topic/0201624386890690172?fid=23) * [Spark RDD基本操作](https://developer.huawei.com/consumer/cn/forum/topic/0201627152644060234?fid=23) * [Spark RDD Shuffle操作](https://developer.huawei.com/consumer/cn/forum/topic/0202627152820110215?fid=23) * [深入理解Spark RDD原理](https://developer.huawei.com/consumer/cn/forum/topic/0202628556358740265?fid=23) * [Spark调度管理之资源分配](https://developer.huawei.com/consumer/cn/forum/topic/0202629577348060308?fid=23) * [Spark调度管理之作业调度](https://developer.huawei.com/consumer/cn/forum/topic/0201629622395410333?fid=23) * [Spark SQL概述](https://developer.huawei.com/consumer/cn/forum/topic/0202630480491580330?fid=23) * [Spark SQL之Dataset与DataFrame](https://developer.huawei.com/consumer/cn/forum/topic/0202630480727520331?fid=23) * [Spark SQL之DataFrame入门操作](https://developer.huawei.com/consumer/cn/forum/topic/0201633012983700432?fid=23) * [Spark SQL之Dataset入门操作](https://developer.huawei.com/consumer/cn/forum/topic/0201633040938970437?fid=23) * [Spark SQL之基于DataFrame创建临时视图](https://developer.huawei.com/consumer/cn/forum/topic/0202633194774890394?fid=23) * [Spark SQL之RDD转为Dataset](https://developer.huawei.com/consumer/cn/forum/topic/0201633208926640450?fid=23) * [Apache Parquet列式存储格式介绍](https://waylau.com/about-apache-parquet/) * [Spark SQL之Apache Parquet数据源的读取和写入](https://developer.huawei.com/consumer/cn/forum/topic/0202634018676920418?fid=23) * [Apache Hive数据仓库介绍](https://developer.huawei.com/consumer/cn/forum/topic/0201634752549850505?fid=23) * [Spark SQL之使用Apache Hive](https://developer.huawei.com/consumer/cn/forum/topic/0202635471716910045?fid=23) * [Spark SQL之使用JDBC操作数据库](https://developer.huawei.com/consumer/cn/forum/topic/0202635607847820058?fid=23) * [Spark SQL之读取二进制文件](https://developer.huawei.com/consumer/cn/forum/topic/0202635626764400066?fid=23) * [Spark导出数据到CSV文件](https://developer.huawei.com/consumer/cn/forum/topic/0202620883150950010?fid=23) * [Spark SQL之时区处理](https://developer.huawei.com/consumer/cn/forum/topic/0202665874275260083?fid=23) * [Spark Streaming概述](https://developer.huawei.com/consumer/cn/forum/topic/0202636427881730132?fid=23) * [Spark Streaming统计来自Socket数据流的词频](https://developer.huawei.com/consumer/cn/forum/topic/0201639135765210068?fid=23) * [Spark Streaming窗口操作](https://developer.huawei.com/consumer/cn/forum/topic/0202639686793340267?fid=23) * [Spark Structured Streaming概述](https://developer.huawei.com/consumer/cn/forum/topic/0202639990757790283?fid=23) * [Spark Structured Streaming统计来自Socket数据流的词频](https://developer.huawei.com/consumer/cn/forum/topic/0201640617749310121?fid=23) * [Spark Structured Streaming窗口操作](https://developer.huawei.com/consumer/cn/forum/topic/0201647684921030332?fid=23) * [在Spark中自定义Log4j配置](https://developer.huawei.com/consumer/cn/forum/topic/0201647777007740340?fid=23) * [Spark MLlib机器学习库概述](https://developer.huawei.com/consumer/cn/forum/topic/0201648414415760370?fid=23) * [Spark MLlib之ML Pipeline详解](https://developer.huawei.com/consumer/cn/forum/topic/0202652669139340720?fid=23) * [Spark MLlib之Estimator、Transformer和Param使用示例](https://developer.huawei.com/consumer/cn/forum/topic/0201648630447880382?fid=23) * [Spark MLlib之ML Pipeline使用示例](https://developer.huawei.com/consumer/cn/forum/topic/0202648630694530630?fid=23) * [Spark GraphX图计算处理概述](https://developer.huawei.com/consumer/cn/forum/topic/0202652669536950721?fid=23) * [Spark GraphX图计算示例](https://developer.huawei.com/consumer/cn/forum/topic/0201652741940200499?fid=23) * [spark-shell启动报错“WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped”的解决](https://developer.huawei.com/consumer/cn/forum/topic/0204726396055740595?fid=23) * [Spark集群部署之集群概述](https://developer.huawei.com/consumer/cn/forum/topic/0203729942975270557?fid=23) * [Spark集群之提交应用到集群](https://developer.huawei.com/consumer/cn/forum/topic/0203729943247780558?fid=23) * [Spark集群之使用Standalone模式部署集群](https://developer.huawei.com/consumer/cn/forum/topic/0204730620151950827?fid=23) * [Spark集群之Standalone模式集群下的高可用方案](https://developer.huawei.com/consumer/cn/forum/topic/0204730620408550828?fid=23) * [Spark系列044——Spark集群之使用YARN模式部署集群](https://developer.huawei.com/consumer/cn/forum/topic/0203732228615380806?fid=23) * [Spark系列045——“java.lang.NoClassDefFoundError”问题的解决 ](https://developer.huawei.com/consumer/cn/forum/topic/0201775600270330248?fid=23) * 未完待续... ## Samples 示例 * [Spark累加器LongAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/LongAccumulatorSample.java) * [Spark累加器DoubleAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/DoubleAccumulatorSample.java) * [Spark累加器CollectionAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/CollectionAccumulatorSample.java) * [SparkLauncher示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/launcher/SparkLauncherSample.java) * [InProcessLauncherSample示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/launcher/InProcessLauncher.java) * [Broadcast 示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/broadcast/BroadcastSample.java) * [RDD基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddBasicSample.java) * [RDD Transformation和Action基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddBasicOperationSample.java) * [DataFrame基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataFrameBasicExample.java) * [Dataset基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DatasetBasicExample.java) * [基于DataFrame创建临时视图](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataFrameTempViewExample.java) * [RDD转为Dataset](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DatasetSchemaExample.java) * [Apache Parquet数据源的读取和写入](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceParquetExample.java) * [使用Apache Hive](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceHiveExample.java) * [使用JDBC操作数据库](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceJDBCExample.java) * [读取二进制文件](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceBinaryFile.java) * [Spark导出数据到CSV文件](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/WriteCVSExample.java) * [Spark SQL时区处理](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/TimeZoneExample.java) * [Spark Streaming统计来自Socket数据流的词频](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/streaming/SparkStreamingSocketSample.java) * [Spark Streaming窗口操作](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/streaming/SparkStreamingWimdowSample.java) * [Structured Streaming统计来自Socket数据流的词频](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/streaming/StructuredStreamingSocketSample.java) * [Structured Streaming窗口操作](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/streaming/StructuredStreamingWindowSample.java) * [Estimator、Transformer和Param使用示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/ml/EstimatorTransformerParamExample.java) * [ML Pipeline使用示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/ml/PipelineExample.java) * [GraphX图计算示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddGraphXSample.java) * 未完待续... ## Get start 如何开始阅读 选择下面入口之一: * * ## Code 源码 书中所有示例源码,移步至的 `samples` 目录下,代码遵循《[Java 编码规范]()》 ## Book 配套书籍 如果你喜欢本开源书,也欢迎支持下该书的正式出版物,实体店及各大网店有售。 * [《循序渐进Spark大数据应用开发》](https://waylau.com/about-harmonyos-mobile-application-development-book)(清华大学出版社) * [京东](https://search.jd.com/Search?keyword=%E5%BE%AA%E5%BA%8F%E6%B8%90%E8%BF%9BSpark%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91&enc=utf-8&wq=%E5%BE%AA%E5%BA%8F%E6%B8%90%E8%BF%9BSpark%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91&pvid=32d2112ca641476d9fc5323cf6113f60) * [当当](https://search.jd.com/Search?keyword=%E5%BE%AA%E5%BA%8F%E6%B8%90%E8%BF%9BSpark%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91&enc=utf-8&wq=%E5%BE%AA%E5%BA%8F%E6%B8%90%E8%BF%9BSpark%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91&pvid=90f7a002994847d08196d4d3e77761a1) ## Issue 意见、建议 如有勘误、意见或建议欢迎拍砖 ## Contact 联系作者 * Blog: [waylau.com](http://waylau.com) * Gmail: [waylau521(at)gmail.com](mailto:waylau521@gmail.com) * Weibo: [waylau521](http://weibo.com/waylau521) * Twitter: [waylau521](https://twitter.com/waylau521) * Github : [waylau](https://github.com/waylau) ## Support Me 请老卫喝一杯 ![开源捐赠](https://waylau.com/images/showmethemoney-sm.jpg)