# hadoop_exporter **Repository Path**: jsqf_admin/hadoop_exporter ## Basic Information - **Project Name**: hadoop_exporter - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-15 - **Last Updated**: 2024-10-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Hadoop Exporter for Prometheus Exports hadoop metrics via HTTP for Prometheus consumption. How to run ``` python hadoop_exporter.py ``` Help on flags of hadoop_exporter: ``` $ python hadoop_exporter.py -h usage: hadoop_exporter.py [-h] [-c cluster_name] [-hdfs namenode_jmx_url] [-rm resourcemanager_jmx_url] [-dn datanode_jmx_url] [-jn journalnode_jmx_url] [-mr mapreduce2_jmx_url] [-hbase hbase_jmx_url] [-hive hive_jmx_url] [-p metrics_path] [-host ip_or_hostname] [-P port] hadoop node exporter args, including url, metrics_path, address, port and cluster. optional arguments: -h, --help show this help message and exit -c cluster_name, --cluster cluster_name Hadoop cluster labels. (default "cluster_indata") -hdfs namenode_jmx_url, --namenode-url namenode_jmx_url Hadoop hdfs metrics URL. (default "http://indata-10-110-13-165.indata.com:50070/jmx") -rm resourcemanager_jmx_url, --resourcemanager-url resourcemanager_jmx_url Hadoop resourcemanager metrics URL. (default "http://indata-10-110-13-164.indata.com:8088/jmx") -dn datanode_jmx_url, --datanode-url datanode_jmx_url Hadoop datanode metrics URL. (default "http://indata-10-110-13-163.indata.com:1022/jmx") -jn journalnode_jmx_url, --journalnode-url journalnode_jmx_url Hadoop journalnode metrics URL. (default "http://indata-10-110-13-163.indata.com:8480/jmx") -mr mapreduce2_jmx_url, --mapreduce2-url mapreduce2_jmx_url Hadoop mapreduce2 metrics URL. (default "http://indata-10-110-13-165.indata.com:19888/jmx") -hbase hbase_jmx_url, --hbase-url hbase_jmx_url Hadoop hbase metrics URL. (default "http://indata-10-110-13-164.indata.com:16010/jmx") -hive hive_jmx_url, --hive-url hive_jmx_url Hadoop hive metrics URL. (default "http://ip:port/jmx") -p metrics_path, --path metrics_path Path under which to expose metrics. (default "/metrics") -host ip_or_hostname, -ip ip_or_hostname, --address ip_or_hostname, --addr ip_or_hostname Polling server on this address. (default "127.0.0.1") -P port, --port port Listen to this port. (default "9130") ``` Tested on Apache Hadoop 2.7.3 # hadoop_exporter # Usage You can run each `Collector` under directory `cmd/`, just like: ``` cd hadoop_exporter/cmd python hdfs_namenode.py -h # input the params the script asked. ``` Or if you want to run the entire project, you should have an webhook/api with url = `http:///alert/getservicesbyhost` to provide the jmx urls: the content in the webhook or api should be like: ``` { "cluster_name1": [ { "node1.fqdn.com": { "DATANODE": { "jmx": "http://node1.fqdn.com:1022/jmx" }, "HBASE_REGIONSERVER": { "jmx": "http://node1.fqdn.com:60030/jmx" }, "HISTORYSERVER": { "jmx": "node1.fqdn.com:19888/jmx" }, "JOURNALNODE": { "jmx": "node1.fqdn.com:8480/jmx" }, "NAMENODE": { "jmx": "node1.fqdn.com:50070/jmx" }, "NODEMANAGER": { "jmx": "node1.fqdn.com:8042/jmx" } } }, { "node2.fqdn.com": { "DATANODE": { "jmx": "http://node2.fqdn.com:1022/jmx" }, "HBASE_REGIONSERVER": { "jmx": "http://node2.fqdn.com:60030/jmx" }, "HIVE_LLAP": { "jmx": "http://node2.fqdn.com:15002/jmx" }, "HIVE_SERVER_INTERACTIVE": { "jmx": "http://node2.fqdn.com:10502/jmx" }, "JOURNALNODE": { "jmx": "http://node2.fqdn.com:8480/jmx" }, "NODEMANAGER": { "jmx": "http://node2.fqdn.com:8042/jmx" } } }, { "node3.fqdn.com": { "DATANODE": { "jmx": "http://node3.fqdn.com:1022/jmx" }, "HBASE_MASTER": { "jmx": "http://node3.fqdn.com:16010/jmx" }, "HBASE_REGIONSERVER": { "jmx": "http://node3.fqdn.com:60030/jmx" }, "JOURNALNODE": { "jmx": "http://node3.fqdn.com:8480/jmx" }, "NODEMANAGER": { "jmx": "http://node3.fqdn.com:8042/jmx" }, "RESOURCEMANAGER": { "jmx": "http://node3.fqdn.com:8088/jmx" } } } ], "cluster_name2": [ { "node4.fqdn.com": { "DATANODE": { "jmx": "http://node4.fqdn.com:1022/jmx" }, "HBASE_REGIONSERVER": { "jmx": "http://node4.fqdn.com:60030/jmx" }, "HISTORYSERVER": { "jmx": "node4.fqdn.com:19888/jmx" }, "JOURNALNODE": { "jmx": "node4.fqdn.com:8480/jmx" }, "NAMENODE": { "jmx": "node4.fqdn.com:50070/jmx" }, "NODEMANAGER": { "jmx": "node4.fqdn.com:8042/jmx" } } }, { "node5.fqdn.com": { "DATANODE": { "jmx": "http://node5.fqdn.com:1022/jmx" }, "HBASE_REGIONSERVER": { "jmx": "http://node5.fqdn.com:60030/jmx" }, "HIVE_LLAP": { "jmx": "http://node5.fqdn.com:15002/jmx" }, "HIVE_SERVER_INTERACTIVE": { "jmx": "http://node5.fqdn.com:10502/jmx" }, "JOURNALNODE": { "jmx": "http://node5.fqdn.com:8480/jmx" }, "NODEMANAGER": { "jmx": "http://node5.fqdn.com:8042/jmx" } } }, { "node6.fqdn.com": { "DATANODE": { "jmx": "http://node6.fqdn.com:1022/jmx" }, "HBASE_MASTER": { "jmx": "http://node6.fqdn.com:16010/jmx" }, "HBASE_REGIONSERVER": { "jmx": "http://node6.fqdn.com:60030/jmx" }, "JOURNALNODE": { "jmx": "http://node6.fqdn.com:8480/jmx" }, "NODEMANAGER": { "jmx": "http://node6.fqdn.com:8042/jmx" }, "RESOURCEMANAGER": { "jmx": "http://node6.fqdn.com:8088/jmx" } } } ] } ``` Then you can run: ``` # -s means the rest api or webhook url mentioned above, should be in format, no schema and path( I know it's ugly). # -P (upper) means hadoop_exporter should export metrics in this port. you can get metrics from python hadoop_exporter.py -s "" -P 9131 ``` **One more thing**: you should run all this steps in **all hadoop nodes**. MAYBE I'll improve this project for common use.