Prometheus使用Consul实现自动服务发现
Consul 介绍
Consul 是基于 GO 语言开发的开源工具,主要面向分布式,服务化的系统提供服务注册、服务发现和配置管理的功能。Consul 提供服务注册/发现、健康检查、Key/Value
存储、多数据中心和分布式一致性保证等功能。之前我们通过 Prometheus 实现监控,当新增一个Target
时,需要变更服务器上的配置文件,即使使用file_sd_configs
配置,也需要登录服务器修改对应Json
文件,会非常麻烦。不过Prometheus
官方支持多种自动服务发现的类型,其中就支持Consul
。
Consul 安装配置
Consul 安装很方便,官网 提供各个系统版本二进制安装包,解压安装即可,同时也可以通过Docker
来快速安装。
源码安装
以 Linux 系统为例,源码安装并以开发模式启动一个单节点,下载最新版二进制安装包,解压启动即可。
开发环境建议
$ wget https://releases.hashicorp.com/consul/1.8.3/consul_1.8.3_linux_amd64.zip
$ unzip -q consul_1.8.3_linux_amd64.zip
$ ./consul agent -dev
生产环境建议
nohup ./consul agent -ui -server -bootstrap -data-dir=/usr/service/consul_cluster/data -pid-file=/run/service/consul_cluster/consul.pid -client=0.0.0.0 -advertise=10.10.21.15 -join=10.10.21.15 -node=consul-c1 &
nohup ./consul agent -ui -server -bootstrap -data-dir=/usr/local/consul/data -pid-file=/usr/local/consul/consul.pid -bind=10.10.21.15 -client=0.0.0.0 -node=consul &
配置文档:https://www.consul.io/docs/agent/options.html
-ui
启动web ui
服务,默认端口8500-server
服务器模式(开发环境可以使用-dev
,不写入磁盘,内存存储)-data-dir
数据存储路径-config-dir
Service配置文件目录,通常为./consul.d
-config-file
Service配置文件路径,该选项可以配置多次-pid-file
pid文件路径-client
指定节点为client-advertise
-join
将节点加入到集群-node
指定节点在集群中的名称
Usage:
-advertise value
Sets the advertise address to use.
-advertise-wan value
Sets address to advertise on WAN instead of -advertise address.
-allow-write-http-from value
Only allow write endpoint calls from given network. CIDR format, can be specified multiple times.
-alt-domain value
Alternate domain to use for DNS interface.
-bind value
Sets the bind address for cluster communication.
-bootstrap
Sets server to bootstrap mode.
-bootstrap-expect value
Sets server to expect bootstrap mode.
-check_output_max_size value
Sets the maximum output size for checks on this agent
-client value
Sets the address to bind for client access. This includes RPC, DNS, HTTP, HTTPS and gRPC (if configured).
-config-dir value
Path to a directory to read configuration files from. This will read every file ending in '.json' as configuration in this directory in alphabetical order. Can be specified multiple times.
-config-file value
Path to a file in JSON or HCL format with a matching file extension. Can be specified multiple times.
-config-format string
Config files are in this format irrespective of their extension. Must be 'hcl' or 'json'
-data-dir value
Path to a data directory to store agent state.
-datacenter value
Datacenter of the agent.
-default-query-time value
the amount of time a blocking query will wait before Consul will force a response. This value can be overridden by the 'wait' query parameter.
-dev
Starts the agent in development mode.
-disable-host-node-id
Setting this to true will prevent Consul from using information from the host to generate a node ID, and will cause Consul to generate a random node ID instead.
-disable-keyring-file
Disables the backing up of the keyring to a file.
-dns-port value
DNS port to use.
-domain value
Domain to use for DNS interface.
-enable-local-script-checks
Enables health check scripts from configuration file.
-enable-script-checks
Enables health check scripts.
-encrypt value
Provides the gossip encryption key.
-grpc-port value
Sets the gRPC API port to listen on (currently needed for Envoy xDS only).
-hcl value
hcl config fragment. Can be specified multiple times.
-http-port value
Sets the HTTP API port to listen on.
-https-port value
Sets the HTTPS API port to listen on.
-join value
Address of an agent to join at start time. Can be specified multiple times.
-join-wan value
Address of an agent to join -wan at start time. Can be specified multiple times.
-log-file value
Path to the file the logs get written to
-log-json
Output logs in JSON format.
-log-level value
Log level of the agent.
-log-rotate-bytes value
Maximum number of bytes that should be written to a log file
-log-rotate-duration value
Time after which log rotation needs to be performed
-log-rotate-max-files value
Maximum number of log file archives to keep
-max-query-time value
the maximum amount of time a blocking query can wait before Consul will force a response. Consul applies jitter to the wait time. The jittered time will be capped to MaxQueryTime.
-node value
Name of this node. Must be unique in the cluster.
-node-id value
A unique ID for this node across space and time. Defaults to a randomly-generated ID that persists in the data-dir.
-node-meta key:value
An arbitrary metadata key/value pair for this node, of the format key:value. Can be specified multiple times.
-non-voting-server
(Enterprise-only) This flag is used to make the server not participate in the Raft quorum, and have it only receive the data replication stream. This can be used to add read scalability to a cluster in cases where a high volume of reads to servers are needed.
-pid-file value
Path to file to store agent PID.
-primary-gateway value
Address of a mesh gateway in the primary datacenter to use to bootstrap WAN federation at start time with retries enabled. Can be specified multiple times.
-protocol value
Sets the protocol version. Defaults to latest.
-raft-protocol value
Sets the Raft protocol version. Defaults to latest.
-recursor value
Address of an upstream DNS server. Can be specified multiple times.
-rejoin
Ignores a previous leave and attempts to rejoin the cluster.
-retry-interval value
Time to wait between join attempts.
-retry-interval-wan value
Time to wait between join -wan attempts.
-retry-join value
Address of an agent to join at start time with retries enabled. Can be specified multiple times.
-retry-join-wan value
Address of an agent to join -wan at start time with retries enabled. Can be specified multiple times.
-retry-max value
Maximum number of join attempts. Defaults to 0, which will retry indefinitely.
-retry-max-wan value
Maximum number of join -wan attempts. Defaults to 0, which will retry indefinitely.
-segment value
(Enterprise-only) Sets the network segment to join.
-serf-lan-allowed-cidrs value
Networks (eg: 192.168.1.0/24) allowed for Serf LAN. Can be specified multiple times.
-serf-lan-bind value
Address to bind Serf LAN listeners to.
-serf-lan-port value
Sets the Serf LAN port to listen on.
-serf-wan-allowed-cidrs value
Networks (eg: 192.168.1.0/24) allowed for Serf WAN (other datacenters). Can be specified multiple times.
-serf-wan-bind value
Address to bind Serf WAN listeners to.
-serf-wan-port value
Sets the Serf WAN port to listen on.
-server
Switches agent to server mode.
-server-port value
Sets the server port to listen on.
-syslog
Enables logging to syslog.
-ui
Enables the built-in static web UI server.
-ui-content-path value
Sets the external UI path to a string. Defaults to: /ui/
-ui-dir value
Path to directory containing the web UI resources.
Consul Web管理页面
启动完毕后,浏览器访问http://127.0.0.1:8500
地址,即可打开Consul Web
管理页面。可以看到默认只有consul
一个Service
,后期我们注册到Consul
的Service
都可以从页面上看到,非常直观。
Docker 安装
使用Docker
启动Consul
单节点服务,直接获取最新版官方镜像consul:latest
命令如下:
$ docker run --name consul -d -p 8500:8500 consul
启动完毕后,同上方法验证是否启动成功,这里为了方便演示,采用Docker
方式启动Consul
,这里的访问地址为:http://172.30.12.167:8500
。
API 注册服务到 Consul
接下来,我们要注册服务到Consul
中,可以通过其提供的API
标准接口来添加。那么先注册一个测试服务,该测试数据为本机node-exporter
服务信息,服务地址及端口为node-exporter
默认提供指标数据的地址,执行如下命令:
$ curl -X PUT -d '{"id": "node-exporter", "name": "node-exporter-172.30.12.167", "address": "172.30.12.167", "port": 9100, "tags": ["test"], "checks": [{"http": "http://172.30.12.167:9100/metrics", "interval": "5s"}]}' http://172.30.12.167:8500/v1/agent/service/register
执行完毕后,刷新一下Consul Web
控制台页面,可以看到成功注册到Consul
中。
提一下,如果要注销掉某个服务,可以通过如下API
命令操作,例如注销上边添加的node-exporter
服务
$ curl -X PUT http://172.30.12.167:8500/v1/agent/service/deregister/node-exporter
配置 Prometheus 实现自动服务发现
现在Consul
服务已经启动完毕,并成功注册了一个服务,接下来,我们需要配置Prometheus
来使用Consul
自动服务发现,目的就是能够将上边添加的服务自动发现到Prometheus
的Targets
中,增加prometheus.yml
配置如下:
- job_name: 'consul-prometheus'
consul_sd_configs:
- server: '172.30.12.167:8500'
services: []
说明一下:这里需要使用consul_sd_configs
来配置使用Consul
服务发现类型,server
为Consul
的服务地址。配置完毕后,重启Prometheus
服务,此时可以通过Prometheus UI
页面的Targets
下查看是否配置成功。
可以看到,在Targets
中能够成功的自动发现Consul
中的Services
信息,后期需要添加新的Targets
时,只需要通过API
往Consul
中注册服务即可,Prometheus
就能自动发现该服务,是不是很方便。
不过,我们会发现有如下几个问题:
(1)会发现Prometheus
同时加载出了默认服务consul
,这个是不需要的
(2)默认只显示job
及instance
两个标签,其他标签都默认属于before relabeling
下,有些必要的服务信息,也想要在标签中展示,该如何操作呢?
(3)如果需要自定义一些标签,例如team
、group
、project
等关键分组信息,方便后边alertmanager
进行告警规则匹配,该如何处理呢?
(4)所有Consul
中注册的Service
都会默认加载到Prometheus
下配置的consul_prometheus
组,如果有多种类型的exporter
,如何在Prometheus
中配置分配给指定类型的组,方便直观的区别它们?
以上问题,我们可以通过Prometheus
配置中的relabel_configs
参数来解决。
配置 relabel_configs 实现自定义标签及分类
relabel_configs功能
我们先来普及一下relabel_configs
的功能,Prometheus
允许用户在采集任务设置中,通过relabel_configs
来添加自定义的Relabeling
等过程,来对标签进行指定规则的重写。Prometheus
加载Targets
后,这些Targets
会自动包含一些默认的标签,Target
以__
作为前置的标签是在系统内部使用的,这些标签不会被写入到样本数据中。眼尖的会发现,每次增加Target
时会自动增加一个instance
标签,而instance
标签的内容刚好对应Target
实例的__address__
值,这是因为实际上Prometheus
内部做了一次标签重写处理,默认__address__
标签设置为<host>:<port>
地址,经过标签重写后,默认会自动将该值设置为instance
标签,所以我们能够在页面看到该标签。
详细relabel_configs
配置及说明可以参考 relabel_config 官网说明,这里简单列举一下里面每个relabel_action
的作用,方便下边演示。
replace
: 根据regex
的配置匹配source_labels
标签的值(注意:多个source_label
的值会按照separator
进行拼接),并且将匹配到的值写入到target_label
当中,如果有多个匹配组,则可以使用${1}, ${2}
确定写入的内容。如果没匹配到任何内容则不对target_label
进行重新,默认为replace
keep
: 丢弃source_labels
的值中没有匹配到regex
正则表达式内容的Target
实例drop
: 丢弃source_labels
的值中匹配到regex
正则表达式内容的Target
实例hashmod
: 将target_label
设置为关联的source_label
的哈希模块labelmap
: 根据regex
去匹配Target
实例所有标签的名称(注意是名称),并且将捕获到的内容作为为新的标签名称,regex
匹配到标签的的值作为新标签的值labeldrop
: 对Target
标签进行过滤,会移除匹配过滤条件的所有标签labelkeep
: 对Target
标签进行过滤,会移除不匹配过滤条件的所有标签
接下来,我们来挨个处理上述问题。
通过标签过滤服务
问题一,我们可以配置relabel_configs
来实现标签过滤,只加载符合规则的服务。以上边为例,可以通过过滤__meta_consul_tags
标签为test
的服务,relabel_config
向Consul
注册服务的时候,只加载匹配regex
表达式的标签的服务到自己的配置文件。修改prometheus.yml
配置如下:
- job_name: 'consul-prometheus'
consul_sd_configs:
- server: '172.30.12.167:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*test.*
action: keep
解释下,这里的relabel_configs
配置作用为丢弃源标签中__meta_consul_tags
不包含test
标签的服务,__meta_consul_tags
对应到Consul
服务中的值为"tags": ["test"]
,默认consul
服务是不带该标签的,从而实现过滤。重启Prometheus
可以看到现在只获取了node-exporter-172.30.12.167
这个服务了。
自定义可视化标签
问题二和问题三可以归为一类,就是将系统默认标签或者用户自定义标签转换成可视化标签,方便查看及后续Alertmanager
进行告警规则匹配分组。不过要实现给服务添加自定义标签,我们还得做一下修改,就是在注册服务时,将自定义标签信息添加到Meta Data
数据中,具体可以参考 Consul Service - Agent HTTP API 官网说明,下边来演示一下如何操作。
新建consul-0.json
如下:
$ vim consul-0.json
{
"ID": "node-exporter",
"Name": "node-exporter-172.30.12.167",
"Tags": [
"test"
],
"Address": "172.30.12.167",
"Port": 9100,
"Meta": {
"app": "spring-boot",
"team": "appgroup",
"project": "bigdata"
},
"EnableTagOverride": false,
"Check": {
"HTTP": "http://172.30.12.167:9100/metrics",
"Interval": "10s"
},
"Weights": {
"Passing": 10,
"Warning": 1
}
}
说明一下:该Json
文件为要注册的服务信息,同时往Meta
信息中添加了app=spring-boot
,team=appgroup
,project=bigdata
三组标签,目的就是为了方便告警分组使用。执行如下命令进行注册:
$ curl --request PUT --data @consul-0.json http://172.30.12.167:8500/v1/agent/service/register?replace-existing-checks=1
注册完毕,通过Consul Web
管理页面可以查看到已注册成功,并且包含了Meta
信息。
然后修改prometheus.yml
配置如下:
- job_name: 'consul-prometheus'
consul_sd_configs:
- server: '172.30.12.167:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*test.*
action: keep
- regex: __meta_consul_service_metadata_(.+)
action: labelmap
解释一下,增加的配置作用为匹配__meta_consul_service_metadata
开头的标签,将捕获到的内容作为新的标签名称,匹配到标签的值作为新标签的值,而我们刚添加的三个自定义标签,系统会自动添加__meta_consul_service_metadata_app=spring-boot
、__meta_consul_service_metadata_team=appgroup
、__meta_consul_service_metadata_project=bigdata
三个标签,经过relabel
后,Prometheus
将会新增app=spring-boot
、team=appgroup
、project=bigdata
三个标签。重启Prometheus
服务,可以看到新增了对应了三个自定义标签。
服务分类
问题四,将自动发现的服务进行分类,本质上跟上边的处理方式一致,可以添加自定义的标签方式,通过标签来区分,二可以通过服务Tag
来进行匹配来创建不同的类型exporter
分组。这里我以第二种为例,通过给每个服务标记不同的Tag
,然后通过relabel_configs
来进行匹配区分。我们来更新一下原node-exporter-172.30.12.167
服务标签,同时注册一个其他类型exporter
的服务如下:
$ vim consul-1.json
{
"ID": "node-exporter",
"Name": "node-exporter-172.30.12.167",
"Tags": [
"node-exporter"
],
"Address": "172.30.12.167",
"Port": 9100,
"Meta": {
"app": "spring-boot",
"team": "appgroup",
"project": "bigdata"
},
"EnableTagOverride": false,
"Check": {
"HTTP": "http://172.30.12.167:9100/metrics",
"Interval": "10s"
},
"Weights": {
"Passing": 10,
"Warning": 1
}
}
# 更新注册服务
$ curl --request PUT --data @consul-1.json http://172.30.12.167:8500/v1/agent/service/register?replace-existing-checks=1
$ vim consul-2.json
{
"ID": "cadvisor-exporter",
"Name": "cadvisor-exporter-172.30.12.167",
"Tags": [
"cadvisor-exporter"
],
"Address": "172.30.12.167",
"Port": 8080,
"Meta": {
"app": "docker",
"team": "cloudgroup",
"project": "docker-service"
},
"EnableTagOverride": false,
"Check": {
"HTTP": "http://172.30.12.167:8080/metrics",
"Interval": "10s"
},
"Weights": {
"Passing": 10,
"Warning": 1
}
}
# 注册服务
$ curl --request PUT --data @consul-2.json http://172.30.12.167:8500/v1/agent/service/register?replace-existing-checks=1
说明一下,我们更新了原node-exporter-172.30.12.167
服务的标签为node-exporter
,同时注册一个新类型cadvisor-exporter-172.30.12.167
服务,并设置标签为cadvisor-exporter
,以示区别。注册完毕,通过Consul Web
控制台可以看到成功注册了这两个服务。
最后,我们修改prometheus.yml
配置如下:
- job_name: 'consul-node-exporter'
consul_sd_configs:
- server: '172.30.12.167:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*node-exporter.*
action: keep
- regex: __meta_consul_service_metadata_(.+)
action: labelmap
- job_name: 'consul-cadvisor-exproter'
consul_sd_configs:
- server: '172.30.12.167:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*cadvisor-exporter.*
action: keep
- regex: __meta_consul_service_metadata_(.+)
action: labelmap
这里需要根据每种类型的exporter
新增一个关联job
,同时relabel_configs
中配置以Tag
来做匹配区分。重启Prometheus
服务,可以看到服务已经按照类型分类了,方便查看。
参考资料
版权声明:
作者:Joe.Ye
链接:https://www.appblog.cn/index.php/2023/03/25/prometheus-uses-consul-to-achieve-automatic-service-discovery/
来源:APP全栈技术分享
文章版权归作者所有,未经允许请勿转载。
共有 0 条评论