# hadoop-wordcount-sample
**Repository Path**: ryanpenn/hadoop-wordcount-sample
## Basic Information
- **Project Name**: hadoop-wordcount-sample
- **Description**: hadoop word count 示例程序,基于Hadoop2.7.3版本
- **Primary Language**: Java
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 2
- **Created**: 2017-02-10
- **Last Updated**: 2020-12-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Windows/Mac + Maven + Hadoop2.7.3 开发环境配置(含wordcount示例)
> 前置条件:安装和配置JDK1.8(略)
## 一、下载hadoop2.7.3
- 下载:[hadoop-2.7.3](http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz)
- 下载:[winutils-2.7.3](https://git.oschina.net/ryanpenn/hadoop-wordcount-sample/raw/master/hadoop-samples/winutils/winutils-2.7.3.rar) (仅Windows需要)
## 二、安装hadoop
1. 解压到
- D:\hadoop\hadoop-2.7.3 (windows)
- ~/Dev/hadoop/hadoop-2.7.3 (mac)
2. 将 winutils 解压(仅windows)
- 将 hadoop.dll 和 winutils.exe 拷贝到 D:\hadoop\hadoop-2.7.3\bin 目录下
## 三、环境变量设置:
(windows)
新建系统环境变量 HADOOP_HOME=D:\hadoop\hadoop-2.7.3
修改path,增加 %HADOOP_HOME%\bin
(mac)
vim ~/.bash_profile
export HADOOP_HOME=~/Dev/hadoop/hadoop-2.7.3
export path=$path:$HADOOP_HOME/bin
## 四、hadoop配置
- 创建本地工作目录
```bash
mkdir %HADOOP_HOME%\home
mkdir %HADOOP_HOME%\home\tmp
mkdir %HADOOP_HOME%\home\name
mkdir %HADOOP_HOME%\home\data
```
- 修改 %HADOOP_HOME%\etc\hadoop 目录下的配置文件:
```xml
hadoop.tmp.dir
/D:/_Dev/server/hadoop/hadoop-2.7.3/home/tmp
dfs.name.dir
/D:/_Dev/server/hadoop/hadoop-2.7.3/home/name
fs.default.name
hdfs://localhost:9000
```
```xml
dfs.replication
1
dfs.data.dir
/D:/_Dev/server/hadoop/hadoop-2.7.3/home/data
```
```xml
mapreduce.framework.name
yarn
mapred.job.tracker
hdfs://localhost:9001
```
```xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
```
- 修改 %HADOOP_HOME%\etc\hadoop 目录下的 hadoop-env.cmd 文件(mac修改 hadoop-env.sh)
```bash
windows配置,指定 JAVA_HOME 的安装目录,不能有空格。(Program Files 用 PROGRA~1 代替)
set JAVA_HOME=D:\PROGRA~1\Java\jdk1.8.0
```
```bash
mac配置
export HADOOP_HEAPSIZE=2000
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_HOME/lib/native"
```
## 五、运行和测试
1. 运行cmd窗口,切换目录 cd %HADOOP_HOME%, 执行 hdfs namenode -format
- 检查配置是否正确 hadoop checknative -a
2. 切换目录 cd sbin,执行 start-all.cmd, 将会启动4个进程 namenode/datanode/nodemanager/resourcemanager
3. 执行 hadoop fs -mkdir hdfs://localhost:9000/user/
4. 执行 hadoop fs -ls hdfs://localhost:9000/, 查看刚刚创建的user目录
5. 停止运行的方法是切换目录 cd sbin,执行 stop-all.cmd
## 六、hadoop自带的web控制台GUI
1. 资源管理GUI: http://localhost:8088/
2. 节点管理GUI: http://localhost:50070/
## 七、在IDEA中编写Hadoop程序
1. 建立maven项目,在pom文件中添加相关的依赖,请参照本示例
2. 将 %HADOOP_HOME%\etc\hadoop 目录下的相关配置文件(core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml)添加到resources文件夹下
## 八、执行程序(完成编码后)
1. 启动
- %HADOOP_HOME%\sbin\start-all.cmd
- $HADOOP_HOME/sbin/start-all.sh (mac)
- 查看资源管理: http://localhost:8088/
- 查看节点管理: http://localhost:50070/
2. 打包
- mvn clean package
3. 创建
- fs -mkdir /user
- fs -mkdir /user/wordcount
- fs -mkdir /user/wordcount/input
- fs -mkdir /user/wordcount/output
4. 上传
- hadoop fs -rm /user/wordcount/input/words.txt (可选,删除之前的文件)
- hadoop fs -put docs/words.txt /user/wordcount/input/words.txt
5. 运行
- hadoop jar target/wordCountJob-with-dependencies.jar /user/wordcount/input/words.txt /user/wordcount/output
6. 观察
- hadoop fs -cat /user/wordcount/output/part-r-00000
7. 结束
- %HADOOP_HOME%\sbin\stop-all.cmd
- $HADOOP_HOME/sbin/stop-all.sh (mac)