1 Star 0 Fork 55

大数据之路/data-warehouse-learning

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Artistic-2.0

开源不易,请各位朋友点个 ★star★ 支持一下,非常感谢~

【 Github地址:https://github.com/Mrkuhuo/data-warehouse-learning 】

【 Gitee 地址:https://gitee.com/wzylzjtn/data-warehouse-learning 】

【 博客 地址:https://bigdatacircle.top/ 】

博客截图

1 介绍

《实时/离线数仓实战》代码,是以电商系统为底座,以电商各类指标统计需求为导向,开发基于Doris和Piamon、Hudi、Iceberg的离线数仓和实时数仓(数据湖)。离线和实时场景其数据处理逻辑是一致的,只是实现技术不一样,旨在给各位朋友提供一种数仓建设的不同思路参考。代码中有两个版本, warehouseV1warehouseV2 的区别是V1业务逻辑较为简单,V2业务逻辑比较齐全,V1的模拟数据生产是通过 Python 脚本模拟生成,V2的模拟数据是通过 JAVA 代码来生成。

2 代码结构

├─src
│  └─main
│      └─java
│          └─org
│              └─bigdatatechcir
│                  ├─images
│                  ├─warehousev1
│                  │  ├─datageneration
│                  │  ├─doris
│                  │  │  ├─dml
│                  │  │  │  ├─ads
│                  │  │  │  ├─dwdarehouse-learning
│                  │  │  │  ├─dws
│                  │  │  │  └─ods
│                  │  │  └─logical
│                  │  │      ├─ads
│                  │  │      ├─dwd
│                  │  │      └─dws
│                  │  ├─flink
│                  │  │  ├─ads
│                  │  │  │  └─test
│                  │  │  ├─dwd
│                  │  │  │  └─test
│                  │  │  ├─dws
│                  │  │  │  └─test
│                  │  │  ├─ods
│                  │  │  │  └─test
│                  │  │  └─sql
│                  │  │      ├─ads
│                  │  │      ├─dwd
│                  │  │      ├─dws
│                  │  │      └─ods
│                  │  ├─mysql
│                  │  └─seatunnel
│                  └─warehousev2
│                      ├─datageneration
│                      │  ├─business
│                      │  └─userlog
│                      ├─doris
│                      │  ├─dml
│                      │  │  ├─ads
│                      │  │  ├─dim
│                      │  │  │  └─data
│                      │  │  ├─dwd
│                      │  │  ├─dws
│                      │  │  └─ods
│                      │  └─logical
│                      │      ├─ads
│                      │      ├─dim
│                      │      ├─dwd
│                      │      └─dws
│                      ├─flink
│                      │  ├─doris
│                      │  │  ├─catalog
│                      │  │  └─ods
│                      │  ├─hudi
│                      │  │  ├─ads
│                      │  │  ├─dim
│                      │  │  ├─dwd
│                      │  │  ├─dws
│                      │  │  └─ods
│                      │  ├─iceberg
│                      │  │  ├─ads
│                      │  │  ├─dim
│                      │  │  ├─dwd
│                      │  │  ├─dws
│                      │  │  └─ods
│                      │  ├─paimon
│                      │  │  ├─ads
│                      │  │  ├─dim
│                      │  │  ├─dwd
│                      │  │  ├─dws
│                      │  │  └─ods
│                      │  └─udf
│                      └─seatunnel
│                          └─ods

3 技术架构

技术架构

该电商数仓(实时/离线)项目技术架构分为四部分:

  1. 数据源模块:采用 JAVA 代码来生成电商业务数据写入 MySQL ,生成用户日志数据写入 Kafka ,两者都可以在配置文件中配置需要生成数据的日期
  2. 数据采集模块:使用 Dinky 开发 FlinkSQL 代码来消费 Kafka 用户日志数据并写入 Doris \ Paimon \ Hudi \ Iceberg ODS 层,使用 DolphinScheduler 配置 SeaTunnel 任务同步 MySQL 业务数据到 Doris ODS 层, 使用 FlinkSQL/CDC 方式分别采集 KafkaMySQL 中的数据到 Paimon \ Hudi \ Iceberg ODS 层
  3. 数仓模块:数仓模块采用业界通用的 ODS -> DWD/DIM -> DWS -> ADS 四级分层,数据在 Doris \ Paimon \ Hudi \ Iceberg 中分别通过批量调度和实时处理的方式进行流转
  4. 数据可视化:ADS 层和 DWS 层的数据可以通过 SuperSetDataRT 来进行报表和大屏制作及展示

通用部分

1) 组件安装

安装文档

2) 模拟数据生成

模拟数据生成

生成业务库数据如下图所示:

业务库数据

生成用户日志数据如下图所示:

用户日志数据

离线数仓建设部分(Doris)

涉及组件:Kafka + Flink + Doris + Seatunnel + Dolphinscheduler

1) 数据采集

Kafka 数据通过 Flink 接入 Doris

Kafka 数据接入

MySQL 数据通过 SeaTunnel 接入 Doris

MySQL 数据接入

2) Doris ODS 层建设

数据采集进 Doris ODS 层,实现效果如下图所示:

Doris ODS层建设

3) Doris DIM 层建设

开发 DorisSQL 进行 DIM 层数据处理

Doris DIM层处理逻辑

DIM 层数据实现效果如下图:

Doris DIM层数据库

4) Doris DWD 层建设

开发 DorisSQL 进行 DWD 层数据处理

Doris DWD层处理逻辑

DWD 层数据实现效果如下图:

Doris DWD层数据库

5) Doris DWS 层建设

开发 DorisSQL 进行 DWS 层数据处理

Doris DWS层处理逻辑

DWS 层数据实现效果如下图:

Doris DWS层数据库

6) Doris ADS 层建设

开发 DorisSQL 进行 ADS 层数据处理

Doris ADS层处理逻辑

ADS 层数据实现效果如下图:

Doris ADS层数据库

7) 任务编排

最终的任务概览如下图所示

Doris 任务概览

任务编排效果如下图所示

Doris 任务概览

8) 数据展示

大屏

实时数仓(数据湖)建设部分(Paimon/Hudi/Iceberg)

涉及组件:Kafka + Flink(CDC/SQL/UDF) + Paimon/Hudi/Iceberg + Hive + Dinky

1) Paimon ODS 层建设

Kafka 数据通过 FlinkSQL 接入 Paimon/Hudi/Iceberg ,实际数据落到 Hive

MySQL 数据通过 FlinkCDC 接入 Paimon/Hudi/Iceberg ,实际数据落到 Hive

Kafka/MySQL 数据接入

数据采集进 Paimon ODS 层,实现效果如下图所示:

Paimon ODS层建设

2) Paimon DWD 层建设

开发 FlinkSQL 进行 DWD 层数据处理

Paimon DWD层建设

DWD 层数据实现效果如下图:

Paimon DWD层建设

3) Paimon DIM 层建设

开发 FlinkSQL 进行 DIM 层数据处理

Paimon DIM层建设

DIM 层数据实现效果如下图:

Paimon DIM层建设

4) Paimon DWS 层建设

开发 FlinkSQL 进行 DWS 层数据处理

Paimon DWS层建设

DWS 层数据实现效果如下图:

Paimon DWS层建设

5) Paimon ADS 层建设

开发 FlinkSQL 进行 ADS 层数据处理

Paimon ADS层建设

ADS 层数据实现效果如下图:

Paimon ADS层建设

6) Doris Catalog 连接 Paimon + DataRT 进行数据展示

大屏

The Artistic License 2.0 Copyright (c) 2000-2006, The Perl Foundation. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble This license establishes the terms under which a given free software Package may be copied, modified, distributed, and/or redistributed. The intent is that the Copyright Holder maintains some artistic control over the development of that Package while still keeping the Package available as open source and free software. You are always permitted to make arrangements wholly outside of this license directly with the Copyright Holder of a given Package. If the terms of this license do not permit the full use that you propose to make of the Package, you should contact the Copyright Holder and seek a different licensing arrangement. Definitions "Copyright Holder" means the individual(s) or organization(s) named in the copyright notice for the entire Package. "Contributor" means any party that has contributed code or other material to the Package, in accordance with the Copyright Holder's procedures. "You" and "your" means any person who would like to copy, distribute, or modify the Package. "Package" means the collection of files distributed by the Copyright Holder, and derivatives of that collection and/or of those files. A given Package may consist of either the Standard Version, or a Modified Version. "Distribute" means providing a copy of the Package or making it accessible to anyone else, or in the case of a company or organization, to others outside of your company or organization. "Distributor Fee" means any fee that you charge for Distributing this Package or providing support for this Package to another party. It does not mean licensing fees. "Standard Version" refers to the Package if it has not been modified, or has been modified only in ways explicitly requested by the Copyright Holder. "Modified Version" means the Package, if it has been changed, and such changes were not explicitly requested by the Copyright Holder. "Original License" means this Artistic License as Distributed with the Standard Version of the Package, in its current version or as it may be modified by The Perl Foundation in the future. "Source" form means the source code, documentation source, and configuration files for the Package. "Compiled" form means the compiled bytecode, object code, binary, or any other form resulting from mechanical transformation or translation of the Source form. Permission for Use and Modification Without Distribution (1) You are permitted to use the Standard Version and create and use Modified Versions for any purpose without restriction, provided that you do not Distribute the Modified Version. Permissions for Redistribution of the Standard Version (2) You may Distribute verbatim copies of the Source form of the Standard Version of this Package in any medium without restriction, either gratis or for a Distributor Fee, provided that you duplicate all of the original copyright notices and associated disclaimers. At your discretion, such verbatim copies may or may not include a Compiled form of the Package. (3) You may apply any bug fixes, portability changes, and other modifications made available from the Copyright Holder. The resulting Package will still be considered the Standard Version, and as such will be subject to the Original License. Distribution of Modified Versions of the Package as Source (4) You may Distribute your Modified Version as Source (either gratis or for a Distributor Fee, and with or without a Compiled form of the Modified Version) provided that you clearly document how it differs from the Standard Version, including, but not limited to, documenting any non-standard features, executables, or modules, and provided that you do at least ONE of the following: (a) make the Modified Version available to the Copyright Holder of the Standard Version, under the Original License, so that the Copyright Holder may include your modifications in the Standard Version. (b) ensure that installation of your Modified Version does not prevent the user installing or running the Standard Version. In addition, the Modified Version must bear a name that is different from the name of the Standard Version. (c) allow anyone who receives a copy of the Modified Version to make the Source form of the Modified Version available to others under (i) the Original License or (ii) a license that permits the licensee to freely copy, modify and redistribute the Modified Version using the same licensing terms that apply to the copy that the licensee received, and requires that the Source form of the Modified Version, and of any works derived from it, be made freely available in that license fees are prohibited but Distributor Fees are allowed. Distribution of Compiled Forms of the Standard Version or Modified Versions without the Source (5) You may Distribute Compiled forms of the Standard Version without the Source, provided that you include complete instructions on how to get the Source of the Standard Version. Such instructions must be valid at the time of your distribution. If these instructions, at any time while you are carrying out such distribution, become invalid, you must provide new instructions on demand or cease further distribution. If you provide valid instructions or cease distribution within thirty days after you become aware that the instructions are invalid, then you do not forfeit any of your rights under this license. (6) You may Distribute a Modified Version in Compiled form without the Source, provided that you comply with Section 4 with respect to the Source of the Modified Version. Aggregating or Linking the Package (7) You may aggregate the Package (either the Standard Version or Modified Version) with other packages and Distribute the resulting aggregation provided that you do not charge a licensing fee for the Package. Distributor Fees are permitted, and licensing fees for other components in the aggregation are permitted. The terms of this license apply to the use and Distribution of the Standard or Modified Versions as included in the aggregation. (8) You are permitted to link Modified and Standard Versions with other works, to embed the Package in a larger work of your own, or to build stand-alone binary or bytecode versions of applications that include the Package, and Distribute the result without restriction, provided the result does not expose a direct interface to the Package. Items That are Not Considered Part of a Modified Version (9) Works (including, but not limited to, modules and scripts) that merely extend or make use of the Package, do not, by themselves, cause the Package to be a Modified Version. In addition, such works are not considered parts of the Package itself, and are not subject to the terms of this license. General Provisions (10) Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license. (11) If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license. (12) This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder. (13) This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed. (14) Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

简介

【2024最新版】 大数据 数据分析 电商系统 实时数仓 离线数仓 数据湖 建设方案及实战代码,涉及组件 flink、paimon、doris、seatunnel、dolphinscheduler、datart、dinky、hudi、iceberg。 展开 收起
Java
Artistic-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Java
1
https://gitee.com/zhoulm-data/data-warehouse-learning.git
git@gitee.com:zhoulm-data/data-warehouse-learning.git
zhoulm-data
data-warehouse-learning
data-warehouse-learning
master

搜索帮助

A270a887 8829481 3d7a4017 8829481