# hadoop-wordcount-sample **Repository Path**: ryanpenn/hadoop-wordcount-sample ## Basic Information - **Project Name**: hadoop-wordcount-sample - **Description**: hadoop word count 示例程序,基于Hadoop2.7.3版本 - **Primary Language**: Java - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2017-02-10 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Windows/Mac + Maven + Hadoop2.7.3 开发环境配置(含wordcount示例) > 前置条件:安装和配置JDK1.8(略) ## 一、下载hadoop2.7.3 - 下载:[hadoop-2.7.3](http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz) - 下载:[winutils-2.7.3](https://git.oschina.net/ryanpenn/hadoop-wordcount-sample/raw/master/hadoop-samples/winutils/winutils-2.7.3.rar) (仅Windows需要) ## 二、安装hadoop 1. 解压到 - D:\hadoop\hadoop-2.7.3 (windows) - ~/Dev/hadoop/hadoop-2.7.3 (mac) 2. 将 winutils 解压(仅windows) - 将 hadoop.dll 和 winutils.exe 拷贝到 D:\hadoop\hadoop-2.7.3\bin 目录下 ## 三、环境变量设置: (windows) 新建系统环境变量 HADOOP_HOME=D:\hadoop\hadoop-2.7.3 修改path,增加 %HADOOP_HOME%\bin (mac) vim ~/.bash_profile export HADOOP_HOME=~/Dev/hadoop/hadoop-2.7.3 export path=$path:$HADOOP_HOME/bin ## 四、hadoop配置 - 创建本地工作目录 ```bash mkdir %HADOOP_HOME%\home mkdir %HADOOP_HOME%\home\tmp mkdir %HADOOP_HOME%\home\name mkdir %HADOOP_HOME%\home\data ``` - 修改 %HADOOP_HOME%\etc\hadoop 目录下的配置文件: ```xml hadoop.tmp.dir /D:/_Dev/server/hadoop/hadoop-2.7.3/home/tmp dfs.name.dir /D:/_Dev/server/hadoop/hadoop-2.7.3/home/name fs.default.name hdfs://localhost:9000 ``` ```xml dfs.replication 1 dfs.data.dir /D:/_Dev/server/hadoop/hadoop-2.7.3/home/data ``` ```xml mapreduce.framework.name yarn mapred.job.tracker hdfs://localhost:9001 ``` ```xml yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler ``` - 修改 %HADOOP_HOME%\etc\hadoop 目录下的 hadoop-env.cmd 文件(mac修改 hadoop-env.sh) ```bash windows配置,指定 JAVA_HOME 的安装目录,不能有空格。(Program Files 用 PROGRA~1 代替) set JAVA_HOME=D:\PROGRA~1\Java\jdk1.8.0 ``` ```bash mac配置 export HADOOP_HEAPSIZE=2000 export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_HOME/lib/native" ``` ## 五、运行和测试 1. 运行cmd窗口,切换目录 cd %HADOOP_HOME%, 执行 hdfs namenode -format - 检查配置是否正确 hadoop checknative -a 2. 切换目录 cd sbin,执行 start-all.cmd, 将会启动4个进程 namenode/datanode/nodemanager/resourcemanager 3. 执行 hadoop fs -mkdir hdfs://localhost:9000/user/ 4. 执行 hadoop fs -ls hdfs://localhost:9000/, 查看刚刚创建的user目录 5. 停止运行的方法是切换目录 cd sbin,执行 stop-all.cmd ## 六、hadoop自带的web控制台GUI 1. 资源管理GUI: http://localhost:8088/ 2. 节点管理GUI: http://localhost:50070/ ## 七、在IDEA中编写Hadoop程序 1. 建立maven项目,在pom文件中添加相关的依赖,请参照本示例 2. 将 %HADOOP_HOME%\etc\hadoop 目录下的相关配置文件(core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml)添加到resources文件夹下 ## 八、执行程序(完成编码后) 1. 启动 - %HADOOP_HOME%\sbin\start-all.cmd - $HADOOP_HOME/sbin/start-all.sh (mac) - 查看资源管理: http://localhost:8088/ - 查看节点管理: http://localhost:50070/ 2. 打包 - mvn clean package 3. 创建 - fs -mkdir /user - fs -mkdir /user/wordcount - fs -mkdir /user/wordcount/input - fs -mkdir /user/wordcount/output 4. 上传 - hadoop fs -rm /user/wordcount/input/words.txt (可选,删除之前的文件) - hadoop fs -put docs/words.txt /user/wordcount/input/words.txt 5. 运行 - hadoop jar target/wordCountJob-with-dependencies.jar /user/wordcount/input/words.txt /user/wordcount/output 6. 观察 - hadoop fs -cat /user/wordcount/output/part-r-00000 7. 结束 - %HADOOP_HOME%\sbin\stop-all.cmd - $HADOOP_HOME/sbin/stop-all.sh (mac)