# BigData_Hadoop **Repository Path**: sbeam/bigdata_hadoop ## Basic Information - **Project Name**: BigData_Hadoop - **Description**: Big Data Course - **Primary Language**: Java - **License**: AGPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-06-15 - **Last Updated**: 2021-10-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 大数据实验 ## Websites Documents: https://hadoop.apache.org/docs/r2.7.2/ Hadoop 2.7.2: http://archive.apache.org/dist/hadoop/core/hadoop-2.7.2/ ## Web Interfaces - http://hadoop1:50070 - http://hadoop2:8088 - http://hadoop3:50090 ## Structure - NameNode - DataNode - Snapshot - Resource Manager - Node Manager |NameNode|NameNode2|DataNode|ResourceManager|NodeManager| |:--:|:--:|:--:|:--:|:--:| |x| |x| |x| | | |x|x|x| | |x|x| |x| ## Installation ### 0x01: Turn off firewall (Run with root privilege) - Check Status: `systemctl status ufw` - Stop: `systemctl stop ufw` - Disable: `systemctl disable ufw` ### 0x02: Setup Environment Variables #### Java Environment - Path: `vim /etc/profile` ```sh #set java environment JAVA_HOME=/home/hadoop/jdk1.8.0_144 JRE_HOME=$JAVA_HOME/jre CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib/rt.jar PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin export JAVA_HOME JRE_HOME CLASSPATH PATH ``` #### Hadoop Environment - Path: `vim /etc/profile.d/hadoop.sh` ```sh export HADOOP_HOME=/home/student/hadoop/hadoop-2.7.2 export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop ``` - Update environment variables: `source /etc/profile` ### 0x03: Change Host Name and hosts - Temporary: `/etc/hostname` - Permanent `hostnamectl set-hostname ` - Lookup ip: `ifconfig` - Edit file: `/etc/hosts` - Don't forget this one: `/etc/profile.d/hosts.sh` - Ping test: `ping [-c ] ` ### 0x04: Setup SSH - Install SSH: `apt install ssh` - Edit SSH config: `sudo vim sshd_config` -> `PermitRootLogin yes` - Generate Key Pair: `ssh-keygen -t rsa` - Copy key: `ssh-copy-id root@hadoop2` - Forget saved Key(s): `ssh-keygen -R hadoop1` #### Scp Usage - Copy files: `scp root@hadoop2:/tmp` - Copy folder: `scp -r ...` ### 0x05: Setup Hadoop Config #### core-site.xml - Core config: `~/hadoop/hadoop-2.7.2/etc/hadoop/core-site.xml` ```xml fs.defaultFS hdfs://hadoop1:9000 hadoop.tmp.dir /home/student/hadoop/hadoop-2.7.2/data/tmp ``` --- #### hdfs-site.xml - HDFS config: `~/hadoop/hadoop-2.7.2/etc/hadoop/hdfs-site.xml` ```xml dfs.replication 3 dfs.namenode.secondary.http-address hadoop3:50090 dfs.http.address 0.0.0.0:50070 ``` --- #### yarn-site.xml - Yarn config: `~/hadoop/hadoop-2.7.2/etc/hadoop/yarn-site.xml` ```xml yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname hadoop2 yarn.resourcemanager.webapp.address 0.0.0.0:8088 yarn.log-aggregation.enable true yarn.log-aggregation.retain-seconds 604800 ``` --- #### mapred-site.xml - Mapreduce config: `~/hadoop/hadoop-2.7.2/etc/hadoop/mapred-site.xml` ```xml mapreduce.framework.name yarn mapreduce.jobhistory.address hadoop1:10020 mapreduce.jobhistory.webapp.address 0.0.0.0:19888 ``` #### Config Env scripts - Edit: `hadoop-env.sh` -> `export JAVA_HOME=/home/student/hadoop/jdk1.8.0_144` - Edit: `yarn-env.sh` -> `export JAVA_HOME=/home/student/hadoop/jdk1.8.0_144` - Edit: `mapred-env.sh` -> `export JAVA_HOME=/home/student/hadoop/jdk1.8.0_144` #### Config Nodes - Edit slaves: `~/hadoop/hadoop-2.7.2/etc/hadoop/slaves` ``` hadoop1 hadoop2 hadoop3 ``` ### 0x06 Format HDFS (On hadoop1) - Exec: `hdfs namenode -format` This command should be run only once. ### 0x07 Start Hadoop - Start HDFS (Run on Namenode): `sbin/start-dfs.sh` - Start yarn (Run on ResourceManager): `sbin/start-yarn.sh` - Download file from hdfs: `hdfs dfs -get ` ### 0x08 Execute Jar on Hadoop - Make directory: `hdfs dfs -mkdir -p /user/root/input` - Upload files: `hdfs dfs -put /user/root/input` - Execute jar: `hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/root/input /user/root/output` **Notice:** output folder should **NOT** be created before launch jar file. For more usage about HDFS please view [HDFS Document](docs/hdfs.md).