246 Star 2.3K Fork 874

儒猿中间件 / 自研分布式小文件存储系统

Create your Gitee Account
Explore and code with more than 12 million developers,Free private repositories !:)
Sign up
Clone or Download
contribute
Sync branch
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README.md
Apache-2.0

介绍

本项目是使用Java开发的一个分布式海量小文件存储系统,功能包括文件上传、文件下载、文件存储等,解决了海量小文件在存储和访问过程中遇到的各种性能问题。

加微信获取更多免费资料

项目特性&设计

  • 网络升级
  • 核心逻辑优化
  • BackupNode+NameNode主备模式高可用架构
  • NameNode联邦架构

生产级技术架构

网络升级

小文件系统的网络部分原本实现比较杂乱,这是由于讲课的时候需要讲解各种技术的使用方式和演示最底层的开发是使用那些API。所以在课程中的网络分别使用了原生NIO和gRpc 但是我们做技术选型的时候选择统一网络请求方式,统一采用Netty作为网络通讯框架,改造前后对比如图:

文件传输协议

在集群中会有几种场景需要进行文件传输,比如上传、下载文件是客户端和DataNode之间进行文件传输,BackupNode和NameNode之间也要进行FsImage的文件传输。所以设计了一套文件传输的协议。文件传输的网络包包括包类型、文件元数据、文件内容二进制数据,如图:

分块传输设计

如图所示,当发送一个请求的时候,假如服务端写回的响应较大(超过最大消息长度),此时可以根据请求是否支持分块传输来决定是否需要拆包传输,可以将1个包拆分为n个包,然后传输给NetClient, NetClient收到包的时候,会根据一定的机制判断整个包是否传输完整,当收到了所有的响应包之后,再将所有的响应包合并成一个包,然后返回给用户。

NameNode联邦架构

为了解决大规模海量小文件带来的内存增长压力,开发了NameNode的联邦架构,简单来说,就是通过多个NameNode节点组成集群,每个NameNode节点保存整个内存目录树的一部分数据。

线上性能参数回放

生产环境下的问题盘点

  • OOMKiller杀死spring boot发压程序
  • 带宽打满导致请求响应超时问题
  • DataNode流量不均匀问题
  • 线程数过多导致的 CPU 100% 问题
  • NameNode上传文件请求在吞吐量和一致性之间的抉择
  • 刷磁盘导致吞吐量大幅下降如何优化

项目配套学习视频

https://space.bilibili.com/478364560/channel/seriesdetail?sid=453116

版权声明

本仓库存放的是儒猿【自研分布式小文件系统】,版权归儒猿技术窝所有,侵权将追究法律责任

官网

www.ruyuan2020.com

编译&运行

由于源码使用了protobuf作为序列化框架,所以下载代码之后需要执行以下命令,生成protobuf序列化文件

cd ruyuan-dfs/ruyuan-dfs-common
mvn protobuf:compile && mvn install

温馨提示:如果你的电脑是Apple M1芯片的,Protobuf编译可能会报错,这个问题可以通过配置指定使用x86架构解决,具体方式如下:

  • 方式一:在ruyuan-dfs-common的pom.xml 中添加如下代码
<properties>
  <os.detected.classifier>osx-x86_64</os.detected.classifier>
</properties>
  • 方式二:全局配置Maven,不用修改代码,在你的Maven的settings.xml(通常在~/.m2/settings.xml)文件下添加如下代码
<profile>
  <id>apple-silicon</id>
  <properties>
    <os.detected.classifier>osx-x86_64</os.detected.classifier>
  </properties>
</profile>

<activeProfiles>
  <activeProfile>default</activeProfile>
  <activeProfile>apple-silicon</activeProfile>
  ...你其他的profile
 </activeProfiles>

启动NameNode

打开配置NameNode的配置文件,在项目根目录下conf目录存在一个namenode.properties文件,打开此文件,修改以下内容:

base.dir=/srv/ruyuan-dfs/namenode  # 修改为你本机的一个路径

启动类为ruyuan-dfs-namenode模块下的类:com.ruyuan.dfs.namenode.NameNode。我们可以运行他的main方法, 但是通常第一次运行是不成功的,会提示异常。

我们需要对启动程序进行一些配置,点击IDEA右上角运行按钮左边的下拉框。 选择 Edit Configurations...,在弹出框中,我们需要配置几个参数:

主要看下面两个红框,需要配置一个JVM参数-Dlogback.configurationFile=conf/logback-namenode.xml 用于指定Logback的配置文件, 接着添加一个Program arguments为 conf/namenode.properties 用于指定NameNode的配置文件,接着就可以运行起来了。

启动BackupNode

BackupNode机器已经和NameNode集成在同一个module中了,启动类为com.ruyuan.dfs.backup.BackupNode 同样的,BackupNode也需要修改配置文件和启动参数:

同样需要修改base.dir属性为你本机的一个路径,其他属性不变即可。启动参数配置如下:

启动DataNode

修改conf/datanode.properties文件中的base.dir参数值为你本机电脑的一个路径

另外需要注意的是,如果你要启动多个DataNode节点,需要改为配置文件的值,其中datanode.id需要改成不同的数值,每个节点不一样, base.dir需要改为不同的文件夹, 避免文件存储冲突,datanode.http.server和datanode.transpot.server的端口都需要改成不同的,避免端口冲突, 主机名也需要换成不同的,不然会造成DataNode注册混乱。因为NameNode是通过hostname来标识一个DataNode节点的。可以通过配置hosts文件

127.0.0.1 datanode01
127.0.0.1 datanode02
127.0.0.1 datanode03

配置启动参数:

运行客户端单元测试

如果上面几个节点都启动了,则可以开始进行单元测试看看效果了,但是在进行单元测试之前,需要先创建一个用户。

运行以下命令创建用户:

curl -H "Content-Type: application/json" -X POST -d '{"username": "admin","secret": "admin"}' "http://localhost:8081/api/user"

运行单元测试

接着就可以运行单元测试,打开ruyuan-dfs-client模块的test文件夹,查看测试类: com.ruyuan.dfs.client .FileSystemTest,直接执行:

通过这个按钮则会将所有流程都测试一遍,包括上传文件、下载文件、创建文件夹等场景。

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

小文件存储: 几KB~几百MB之间的文件存储 expand collapse
Java and 5 more languages
Apache-2.0
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
1
https://gitee.com/suzhou-mopdila-information/ruyuan-dfs.git
git@gitee.com:suzhou-mopdila-information/ruyuan-dfs.git
suzhou-mopdila-information
ruyuan-dfs
自研分布式小文件存储系统
master

Search