Logstash grok配置调试

grok是一种采用组合多个预定义的正则表达式,用来匹配分割文本并映射到关键字的工具。通常用来对日志数据进行预处理。logstash的filter模块中grok插件是其实现之一。

logstash内置的grok匹配规则可参考:https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns 。grok还支持自定义匹配字段规则,可以灵活满足扩展的需求。

日志方式

input {
    ...
}

filter {
    ...
}

output {
    if "_grokparsefailure" in [tags] {
        file { path => "/data/logs/logstash/grok_failures.txt" }  #解析失败日志
    } else {
        elasticsearch {
            hosts => ["192.168.165.239:9200"]
            index => "%{type}"
        }
        stdout {
           codec => rubydebug  #控制台输出日志
        }
    }
}

此时可以通过前台启动查看控制台输出日志:

# bin/logstash -f config_file/log.conf

Kibana Dev Tools

Console

查询ES采集的数据格式是否符合grok切割预期

GET /appblog/_search
{
  "query": {
    "match_all": {

    }
  },
  "sort": [{ "@timestamp": { "order" : "desc"} }]
}

Grok Debugger

Sample Data 输入日志样例,如

2019-05-25 15:23:32.009 [cn-appblog-provider-channel-gateway-alipay][ INFO ] [65117] [nio-8851-exec-8] [47a999cec484e6b5] [0ea76f03cdf92c57] [true] --- [cn.appblog.provider.channel.gateway.alipay.helper.XStreamHelper] [parseAlipayCreateReturn] [39] : This is log content

Grok Pattern 输入匹配规则,如

%{TIME_STAMP_A:logtime}\s+\[%{APP_NAME:appname}\]\[\s+%{LOG_LVL:loglvl}\s+\]\s+\[%{PROCESS_ID:pid}\]\s+\[%{PROCESS_NAME:pname}\]\s+\[%{TRACE_ID:traceid}\]\s+\[%{SPAN_ID:spanid}\]\s+\[%{SPAN_EXPORTABLE}\]\s+---\s+\[%{CLASS_PATH:classpath}\]\s+\[%{METHOD_NAME:methodname}\]\s+\[%{CODE_LINE:codeline}\]\s+:\s+%{CONTENT:content}

Custom Patterns 输入自定义规则,如

TIME_STAMP_A \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}
TIME_STAMP_T \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z
TIME_STAMP_P \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TIME_STAMP_S \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}
HOST_NAME_PATTERN [a-zA-Z0-9._-]+
APP_NAME [a-zA-Z0-9._-]+
LOG_LVL [a-zA-Z0-9._-]+
CORRELATION_ID [0-9a-f-]{36}
CIP ((?:(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d)))\.){3}(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d))))
ID_PATTERN [0-9a-f\-]{36}
RPC_ID_PATTERN [0-9\.]+
APP_OR_METHOD [/a-zA-Z0-9._-]+
TRACE_ID [0-9a-f]*
SPAN_ID [0-9a-f]*
PROCESS_ID \d{0,5}
PROCESS_NAME [a-zA-Z0-9._-]+
SPAN_EXPORTABLE [a-z]{0,5}
CLASS_PATH [a-zA-Z0-9._]+
METHOD_NAME [a-zA-Z0-9_]+
CODE_LINE \d{1,5}
CONTENT [\s\S]*$

点击Simulate,得到Structured Data

{
  "traceid": "47a999cec484e6b5",
  "classpath": "cn.appblog.provider.channel.gateway.alipay.helper.XStreamHelper",
  "loglvl": "INFO",
  "pname": "nio-8851-exec-8",
  "pid": "65117",
  "content": "This is log content",
  "codeline": "39",
  "spanid": "0ea76f03cdf92c57",
  "appname": "cn-appblog-provider-channel-gateway-alipay",
  "logtime": "2019-05-25 15:23:32.009",
  "methodname": "parseAlipayCreateReturn"
}

配置示例

2019-05-23 11:50:36.022 [cn-appblog-provider-channel-core][ INFO ] [21992] [nio-8888-exec-1] [143da285c068e5e1] [cb964a4c7b09ee0e] [true] --- [cn.appblog.provider.channel.core.helper.ChannelInfoHelper] [checkChannelInfo] [35] : ChannelPayRequest.checkChannelInfo [MerchantId: 142019050800009001, TransSerialNo: 122019052300016001, ChnlCode: alipay_offline_payment]
input {
    kafka {
        bootstrap_servers => "192.168.1.10:9092"
        topics => "logstash"
        group_id => "logstash"
        consumer_threads => 5
        decorate_events => true
        codec => json
        type => "thaipay"
        #auto_offset_reset => "smallest"
        #reset_beginning => true
   }
}

filter {
    if [type] == "thaipay" {
        if [message] =~ "^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}\s+\[[a-zA-Z0-9._-]+\]\s*\[\s*[a-zA-Z0-9._-]+\s*\][\s\S]*$" {
            grok {
                patterns_dir => "/data/server/logstash/config_file/patterns"
                #add_field => {"logmatch" => "100001"}
                #match => { "message" => "%{TIME_STAMP_A:logtime}" }
                #match => { "message" => "%{TIME_STAMP_A:logtime}\s+\[%{APP_NAME:appname}\]\s+\[%{LOG_LVL:loglvl}\]" }
                #match => { "message" => "%{TIME_STAMP_A:logtime}\s+\[%{APP_NAME:appname}\]\[\s+%{LOG_LVL:loglvl}\s+\]\s+\[%{PROCESS_ID:pid}\]\s+\[%{PROCESS_NAME:pname}\]\s+\[%{TRACE_ID:traceid}\]\s+\[%{SPAN_ID:spanid}\]\s+\[%{SPAN_EXPORTABLE}\]\s+---\s+\[%{CLASS_PATH:classpath}\]\s+\[%{METHOD_NAME:methodname}\]\s+\[%{CODE_LINE:codeline}\]" }
                match => { "message" => "%{TIME_STAMP_A:logtime}\s+\[\s*%{APP_NAME:appname}\s*\]\[\s*%{LOG_LVL:loglvl}\s*\]\s+\[\s*%{PROCESS_ID:pid}\s*\]\s+\[\s*%{PROCESS_NAME:pname}\s*\]\s+\[\s*%{TRACE_ID:traceid}\s*\]\s+\[\s*%{SPAN_ID:spanid}\s*\]\s+\[\s*%{SPAN_EXPORTABLE}\s*\]\s+---\s+\[\s*%{CLASS_PATH:classpath}\s*\]\s+\[\s*%{METHOD_NAME:methodname}\s*\]\s+\[\s*%{CODE_LINE:codeline}\s*\]\s+:\s+%{CONTENT:content}" }
            }
            #date {
            #    match => ["logtime", "yyyy-MM-dd HH:mm:ss.SSS"]
            #    target => "messagetime"
                #locale => "en"
                #timezone => "+00:00"
                #remove_field => ["logtime"]
            #}
        }
    }
}

output {
    if "_grokparsefailure" in [tags] {
        file { path => "/data/logs/logstash/grok_failures.txt" }
    } else {
        elasticsearch {
            hosts => ["192.168.1.10:9200"]
            index => "%{type}"
        }
        stdout {
           codec => rubydebug
        }
    }
}
TIME_STAMP_A \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}
TIME_STAMP_T \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z
TIME_STAMP_P \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TIME_STAMP_S \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}
HOST_NAME_PATTERN [a-zA-Z0-9._-]+
APP_NAME [a-zA-Z0-9._-]+
LOG_LVL [a-zA-Z0-9._-]+
CORRELATION_ID [0-9a-f-]{36}
CIP ((?:(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d)))\.){3}(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d))))
ID_PATTERN [0-9a-f\-]{36}
RPC_ID_PATTERN [0-9\.]+
APP_OR_METHOD [/a-zA-Z0-9._-]+
TRACE_ID [0-9a-f]*
SPAN_ID [0-9a-f]*
PROCESS_ID \d{3,5}
PROCESS_NAME [a-zA-Z0-9._-]+
SPAN_EXPORTABLE [a-z]{0,5}
CLASS_PATH [a-zA-Z0-9._]+
METHOD_NAME [a-zA-Z0-9_]+
CODE_LINE \d{1,5}
CONTENT [\s\S]*$

版权声明:
作者:Joe.Ye
链接:https://www.appblog.cn/index.php/2023/03/19/logstash-grok-configuration-debugging/
来源:APP全栈技术分享
文章版权归作者所有,未经允许请勿转载。

THE END
分享
二维码
打赏
海报
Logstash grok配置调试
grok是一种采用组合多个预定义的正则表达式,用来匹配分割文本并映射到关键字的工具。通常用来对日志数据进行预处理。logstash的filter模块中grok插件是其实现……
<<上一篇
下一篇>>
文章目录
关闭
目 录