Ai
1 Star 0 Fork 0

诸葛子房/spark-demo

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
WordCountSql.java 1.49 KB
一键复制 编辑 原始数据 按行查看 历史
诸葛子房 提交于 2022-05-22 12:10 +08:00 . spark sql 处理
import model.Info;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import scala.Tuple2;
import java.util.Arrays;
public class WordCountSql {
public static void main(String[] args) {
SparkSession spark = SparkSession.builder()
.appName("RDDToDataset")
.master("local[*]")
.getOrCreate();
JavaRDD<String> lines = spark.read().textFile("src/main/resources/data").javaRDD();
JavaRDD<String> words = lines.flatMap(line -> Arrays.asList(line.split(" ")).iterator());
JavaRDD<Info> stuRDD = words.map(new Function<String, Info>() {
public Info call(String line) throws Exception {
System.out.println(line);
Info stu = new Info();
stu.setWord(line);
stu.setCnt(1);
return stu;
}
});
Dataset<Row> stuDf = spark.createDataFrame(stuRDD, Info.class);
stuDf.printSchema();
stuDf.createOrReplaceTempView("info");
Dataset<Row> nameDf = spark.sql("select word,count(cnt) as cnt from info group by word");
nameDf.show();
nameDf.coalesce(2).write().mode(SaveMode.Overwrite).format("csv").csv("src/main/resources/result");
spark.stop();
}
}
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/ZhuGeZiFang/spark-demo.git
git@gitee.com:ZhuGeZiFang/spark-demo.git
ZhuGeZiFang
spark-demo
spark-demo
master

搜索帮助