# logstash-filter-pattern-enricher **Repository Path**: judddd/logstash-filter-pattern-enricher ## Basic Information - **Project Name**: logstash-filter-pattern-enricher - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-06-10 - **Last Updated**: 2025-06-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Logstash Pattern Enricher Filter Plugin 这个插件允许您基于正则表达式模式匹配从Elasticsearch中获取数据来丰富您的日志事件。插件会从指定的Elasticsearch索引中读取模式定义,并将匹配的数据字段添加到事件中。 ## 基本用法 ### 1. 基础配置 ```ruby filter { pattern_enricher { hosts => ["localhost:9200"] index => "pattern-metadata" source_field => "message" } } ``` ### 2. 完整配置示例 ```ruby filter { pattern_enricher { # Elasticsearch连接配置 hosts => ["localhost:9200"] user => "elastic" password => "password" index => "pattern-metadata" # 匹配配置 source_field => "message" target_prefix => "enriched_" pattern_field => "regex_pattern" # 缓存配置 refresh_interval => 300 cache_size => 5000 # 字段选择 enrichment_fields => ["category", "severity", "description"] # 错误处理 tag_on_failure => ["_enrichment_failed"] tag_on_no_match => ["_no_pattern_match"] tag_on_query_empty => ["_no_patterns_available"] } } ``` ## 配置参数详解 ### Elasticsearch连接配置 | 参数 | 类型 | 默认值 | 描述 | | ---------- | -------- | -------------------- | --------------------- | | `hosts` | array | `["localhost:9200"]` | Elasticsearch集群地址 | | `user` | string | `nil` | 用户名(可选) | | `password` | password | `nil` | 密码(可选) | | `index` | string | `"pattern-metadata"` | 存储模式的索引名 | ### SSL配置 ```ruby ssl => { enabled => true, verification_mode => "certificate", ca_file => "/path/to/ca.crt" } ``` ### 查询配置 #### 使用默认查询(推荐) ```ruby cache_size => 1000 ``` #### 使用自定义查询 ```ruby query_dsl => '{ "query": { "bool": { "must": [ {"term": {"status": "active"}}, {"range": {"updated_at": {"gte": "now-7d"}}} ] } }, "sort": [{"priority": {"order": "desc"}}], "size": 2000 }' ``` ### 字段配置 | 参数 | 类型 | 默认值 | 描述 | | ------------------- | ------- | ----------- | -------------------------------------- | | `source_field` | string | `"message"` | 要匹配的源字段 | | `pattern_field` | string | `"pattern"` | Elasticsearch中存储正则表达式的字段名 | | `target_prefix` | string | `""` | 添加到目标字段的前缀 | | `enrichment_fields` | array | `[]` | 要复制的特定字段列表(为空则复制所有) | | `exclude_fields` | array | `[]` | 要排除的字段列表 | | `copy_all_fields` | boolean | `true` | 是否复制所有可用字段 | ### 元数据配置 ```ruby add_metadata => true metadata_field => "pattern_info" ``` ### 缓存配置 | 参数 | 类型 | 默认值 | 描述 | | ------------------ | ------ | ------ | ------------------ | | `refresh_interval` | number | `60` | 缓存刷新间隔(秒) | | `cache_size` | number | `1000` | 最大缓存条目数 | ### 错误处理标签 ```ruby tag_on_failure => ["_enrichment_error"] tag_on_no_match => ["_no_match_found"] tag_on_query_empty => ["_patterns_unavailable"] ``` ## Elasticsearch索引结构 您的Elasticsearch索引应该包含以下结构的文档: ### 基本结构 ```json { "pattern": "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", "category": "timestamp", "severity": "info", "description": "Standard timestamp format", "priority": 10 } ``` ### 正则表达式格式 支持以下正则表达式格式: ```json // 方式1:直接字符串 {"pattern": "ERROR.*"} // 方式2:带斜杠包围 {"pattern": "/ERROR.*/"} // 方式3:复杂模式 {"pattern": "(?i)\\b(error|exception|fail)\\b"} ``` ## 使用示例 ### 示例1:日志级别识别 **Elasticsearch文档:** ```json { "pattern": "\\b(ERROR|WARN|INFO|DEBUG)\\b", "log_level": "INFO", "severity_score": 2, "alert_required": false } ``` **Logstash配置:** ```ruby filter { pattern_enricher { hosts => ["localhost:9200"] index => "log-patterns" source_field => "message" target_prefix => "log_" } } ``` **输入事件:** ```json { "message": "2024-01-15 INFO Application started successfully" } ``` **输出事件:** ```json { "message": "2024-01-15 INFO Application started successfully", "log_log_level": "INFO", "log_severity_score": 2, "log_alert_required": false, "pattern_metadata": { "matched_pattern_id": "pattern_001", "matched_pattern": "\\b(ERROR|WARN|INFO|DEBUG)\\b", "matched_value": "2024-01-15 INFO Application started successfully", "enrichment_status": "success" } } ``` ### 示例2:IP地址分类 **Elasticsearch文档:** ```json { "pattern": "192\\.168\\.\\d+\\.\\d+", "network_type": "private", "location": "internal", "risk_level": "low" } ``` **Logstash配置:** ```ruby filter { pattern_enricher { hosts => ["localhost:9200"] index => "ip-patterns" source_field => "client_ip" enrichment_fields => ["network_type", "location", "risk_level"] tag_on_no_match => ["_unknown_ip_range"] } } ``` ### 示例3:自定义查询 ```ruby filter { pattern_enricher { hosts => ["localhost:9200"] index => "patterns" query_dsl => '{ "query": { "bool": { "must": [ {"term": {"environment": "production"}}, {"term": {"enabled": true}} ], "filter": { "range": { "created_at": { "gte": "now-30d" } } } } }, "sort": [ {"priority": {"order": "desc"}}, {"_score": {"order": "desc"}} ] }' } } ``` ## 高级配置 ### 多环境配置 ```ruby filter { if [environment] == "production" { pattern_enricher { hosts => ["prod-es-01:9200", "prod-es-02:9200"] index => "prod-patterns" user => "${PROD_ES_USER}" password => "${PROD_ES_PASSWORD}" ssl => { enabled => true verification_mode => "certificate" ca_file => "/etc/ssl/certs/prod-ca.crt" } } } else { pattern_enricher { hosts => ["dev-es:9200"] index => "dev-patterns" } } } ``` ### 性能优化配置 ```ruby filter { pattern_enricher { hosts => ["localhost:9200"] index => "patterns" # 缓存优化 refresh_interval => 600 # 10分钟刷新一次 cache_size => 10000 # 更大的缓存 # 查询优化 use_default_query => false query_body => '{ "query": {"term": {"active": true}}, "_source": ["pattern", "category", "priority"], "size": 5000 }' # 字段优化 copy_all_fields => false enrichment_fields => ["category", "priority"] exclude_fields => ["internal_notes", "debug_info"] } } ``` ## 错误处理和监控 ### 标签监控 通过标签可以监控插件的运行状态: ```ruby # 检查失败的事件 if "_enrichment_error" in [tags] { # 发送告警或记录错误 } # 检查未匹配的事件 if "_no_match_found" in [tags] { # 记录统计信息 } # 检查模式为空的情况 if "_patterns_unavailable" in [tags] { # 检查Elasticsearch连接和索引状态 } ``` ### 元数据字段 插件会添加详细的元数据信息: ```json { "pattern_metadata": { "matched_pattern_id": "pattern_123", "matched_pattern": "\\berror\\b", "matched_value": "Application error occurred", "enrichment_status": "success", "query_empty": false } } ``` ## 常见问题和解决方案 ### 1. 正则表达式不匹配 确保正则表达式格式正确,可以使用在线工具测试: ```json // 错误:未转义的特殊字符 {"pattern": "192.168.1.1"} // 正确:转义的特殊字符 {"pattern": "192\\.168\\.1\\.1"} ``` ### 2. 性能优化 - 增加 `refresh_interval` 减少Elasticsearch查询频率 - 使用 `enrichment_fields` 限制复制的字段 - 优化自定义查询,使用过滤器而不是查询 ### 3. 连接问题 检查Elasticsearch连接配置和网络可达性: ```ruby # 启用详细日志 filter { pattern_enricher { # ... 其他配置 retry_on_failure => 5 retry_on_status => [429, 502, 503, 504, 522, 524] } } ``` ## 最佳实践 1. **模式设计**:使用具体的正则表达式,避免过于宽泛的匹配 2. **缓存管理**:根据数据更新频率调整 `refresh_interval` 3. **字段选择**:使用 `enrichment_fields` 只复制需要的字段 4. **错误监控**:监控标签来及时发现问题 5. **性能测试**:在生产环境前进行充分的性能测试