# hadoop_exporter

**Repository Path**: jsqf_admin/hadoop_exporter

## Basic Information

- **Project Name**: hadoop_exporter
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-10-15
- **Last Updated**: 2024-10-15

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Hadoop Exporter for Prometheus
Exports hadoop metrics via HTTP for Prometheus consumption.

How to run
```
python hadoop_exporter.py
```

Help on flags of hadoop_exporter:
```
$ python hadoop_exporter.py -h
usage: hadoop_exporter.py [-h] [-c cluster_name] [-hdfs namenode_jmx_url]
                          [-rm resourcemanager_jmx_url] [-dn datanode_jmx_url]
                          [-jn journalnode_jmx_url] [-mr mapreduce2_jmx_url]
                          [-hbase hbase_jmx_url] [-hive hive_jmx_url]
                          [-p metrics_path] [-host ip_or_hostname] [-P port]

hadoop node exporter args, including url, metrics_path, address, port and
cluster.

optional arguments:
  -h, --help            show this help message and exit
  -c cluster_name, --cluster cluster_name
                        Hadoop cluster labels. (default "cluster_indata")
  -hdfs namenode_jmx_url, --namenode-url namenode_jmx_url
                        Hadoop hdfs metrics URL. (default
                        "http://indata-10-110-13-165.indata.com:50070/jmx")
  -rm resourcemanager_jmx_url, --resourcemanager-url resourcemanager_jmx_url
                        Hadoop resourcemanager metrics URL. (default
                        "http://indata-10-110-13-164.indata.com:8088/jmx")
  -dn datanode_jmx_url, --datanode-url datanode_jmx_url
                        Hadoop datanode metrics URL. (default
                        "http://indata-10-110-13-163.indata.com:1022/jmx")
  -jn journalnode_jmx_url, --journalnode-url journalnode_jmx_url
                        Hadoop journalnode metrics URL. (default
                        "http://indata-10-110-13-163.indata.com:8480/jmx")
  -mr mapreduce2_jmx_url, --mapreduce2-url mapreduce2_jmx_url
                        Hadoop mapreduce2 metrics URL. (default
                        "http://indata-10-110-13-165.indata.com:19888/jmx")
  -hbase hbase_jmx_url, --hbase-url hbase_jmx_url
                        Hadoop hbase metrics URL. (default
                        "http://indata-10-110-13-164.indata.com:16010/jmx")
  -hive hive_jmx_url, --hive-url hive_jmx_url
                        Hadoop hive metrics URL. (default
                        "http://ip:port/jmx")
  -p metrics_path, --path metrics_path
                        Path under which to expose metrics. (default
                        "/metrics")
  -host ip_or_hostname, -ip ip_or_hostname, --address ip_or_hostname, --addr ip_or_hostname
                        Polling server on this address. (default "127.0.0.1")
  -P port, --port port  Listen to this port. (default "9130")
```


Tested on Apache Hadoop 2.7.3
# hadoop_exporter

# Usage
You can run each `Collector` under directory `cmd/`, just like:
```
cd hadoop_exporter/cmd
python hdfs_namenode.py  -h
# input the params the script asked.
```
Or if you want to run the entire project, you should have an webhook/api with url = `http://<rest_api_host_and_port>/alert/getservicesbyhost` to provide the jmx urls: the content in the webhook or api should be like:
```
{
    "cluster_name1": [
        {
            "node1.fqdn.com": {
                "DATANODE": {
                    "jmx": "http://node1.fqdn.com:1022/jmx"
                },
                "HBASE_REGIONSERVER": {
                    "jmx": "http://node1.fqdn.com:60030/jmx"
                },
                "HISTORYSERVER": {
                    "jmx": "node1.fqdn.com:19888/jmx"
                },
                "JOURNALNODE": {
                    "jmx": "node1.fqdn.com:8480/jmx"
                },
                "NAMENODE": {
                    "jmx": "node1.fqdn.com:50070/jmx"
                },
                "NODEMANAGER": {
                    "jmx": "node1.fqdn.com:8042/jmx"
                }
            }
        },
        {
            "node2.fqdn.com": {
                "DATANODE": {
                    "jmx": "http://node2.fqdn.com:1022/jmx"
                },
                "HBASE_REGIONSERVER": {
                    "jmx": "http://node2.fqdn.com:60030/jmx"
                },
                "HIVE_LLAP": {
                    "jmx": "http://node2.fqdn.com:15002/jmx"
                },
                "HIVE_SERVER_INTERACTIVE": {
                    "jmx": "http://node2.fqdn.com:10502/jmx"
                },
                "JOURNALNODE": {
                    "jmx": "http://node2.fqdn.com:8480/jmx"
                },
                "NODEMANAGER": {
                    "jmx": "http://node2.fqdn.com:8042/jmx"
                }
            }
        },
        {
            "node3.fqdn.com": {
                "DATANODE": {
                    "jmx": "http://node3.fqdn.com:1022/jmx"
                },
                "HBASE_MASTER": {
                    "jmx": "http://node3.fqdn.com:16010/jmx"
                },
                "HBASE_REGIONSERVER": {
                    "jmx": "http://node3.fqdn.com:60030/jmx"
                },
                "JOURNALNODE": {
                    "jmx": "http://node3.fqdn.com:8480/jmx"
                },
                "NODEMANAGER": {
                    "jmx": "http://node3.fqdn.com:8042/jmx"
                },
                "RESOURCEMANAGER": {
                    "jmx": "http://node3.fqdn.com:8088/jmx"
                }
            }
        }
    ],
    "cluster_name2": [
        {
            "node4.fqdn.com": {
                "DATANODE": {
                    "jmx": "http://node4.fqdn.com:1022/jmx"
                },
                "HBASE_REGIONSERVER": {
                    "jmx": "http://node4.fqdn.com:60030/jmx"
                },
                "HISTORYSERVER": {
                    "jmx": "node4.fqdn.com:19888/jmx"
                },
                "JOURNALNODE": {
                    "jmx": "node4.fqdn.com:8480/jmx"
                },
                "NAMENODE": {
                    "jmx": "node4.fqdn.com:50070/jmx"
                },
                "NODEMANAGER": {
                    "jmx": "node4.fqdn.com:8042/jmx"
                }
            }
        },
        {
            "node5.fqdn.com": {
                "DATANODE": {
                    "jmx": "http://node5.fqdn.com:1022/jmx"
                },
                "HBASE_REGIONSERVER": {
                    "jmx": "http://node5.fqdn.com:60030/jmx"
                },
                "HIVE_LLAP": {
                    "jmx": "http://node5.fqdn.com:15002/jmx"
                },
                "HIVE_SERVER_INTERACTIVE": {
                    "jmx": "http://node5.fqdn.com:10502/jmx"
                },
                "JOURNALNODE": {
                    "jmx": "http://node5.fqdn.com:8480/jmx"
                },
                "NODEMANAGER": {
                    "jmx": "http://node5.fqdn.com:8042/jmx"
                }
            }
        },
        {
            "node6.fqdn.com": {
                "DATANODE": {
                    "jmx": "http://node6.fqdn.com:1022/jmx"
                },
                "HBASE_MASTER": {
                    "jmx": "http://node6.fqdn.com:16010/jmx"
                },
                "HBASE_REGIONSERVER": {
                    "jmx": "http://node6.fqdn.com:60030/jmx"
                },
                "JOURNALNODE": {
                    "jmx": "http://node6.fqdn.com:8480/jmx"
                },
                "NODEMANAGER": {
                    "jmx": "http://node6.fqdn.com:8042/jmx"
                },
                "RESOURCEMANAGER": {
                    "jmx": "http://node6.fqdn.com:8088/jmx"
                }
            }
        }
    ]
}
```
Then you can run:
```
# -s means the rest api or webhook url mentioned above, should be in <host:port> format, no schema and path( I know it's ugly).
# -P (upper) means hadoop_exporter should export metrics in this port. you can get metrics from <http://hostname:9131/metrics>
python hadoop_exporter.py -s "<rest_api_host_and_port>" -P 9131
```

**One more thing**:
you should run all this steps in **all hadoop nodes**.

MAYBE I'll improve this project for common use.