Prometheus使用Consul实现自动服务发现

Consul 介绍

Consul 是基于 GO 语言开发的开源工具,主要面向分布式,服务化的系统提供服务注册、服务发现和配置管理的功能。Consul 提供服务注册/发现、健康检查、Key/Value存储、多数据中心和分布式一致性保证等功能。之前我们通过 Prometheus 实现监控,当新增一个Target时,需要变更服务器上的配置文件,即使使用file_sd_configs配置,也需要登录服务器修改对应Json文件,会非常麻烦。不过Prometheus官方支持多种自动服务发现的类型,其中就支持Consul

Consul 安装配置

Consul 安装很方便,官网 提供各个系统版本二进制安装包,解压安装即可,同时也可以通过Docker来快速安装。

源码安装

以 Linux 系统为例,源码安装并以开发模式启动一个单节点,下载最新版二进制安装包,解压启动即可。

开发环境建议

$ wget https://releases.hashicorp.com/consul/1.8.3/consul_1.8.3_linux_amd64.zip
$ unzip -q consul_1.8.3_linux_amd64.zip
$ ./consul agent -dev    

生产环境建议

nohup ./consul agent -ui -server -bootstrap -data-dir=/usr/service/consul_cluster/data -pid-file=/run/service/consul_cluster/consul.pid -client=0.0.0.0 -advertise=10.10.21.15 -join=10.10.21.15 -node=consul-c1 &
nohup ./consul agent -ui -server -bootstrap -data-dir=/usr/local/consul/data -pid-file=/usr/local/consul/consul.pid -bind=10.10.21.15 -client=0.0.0.0 -node=consul &

配置文档:https://www.consul.io/docs/agent/options.html

  • -ui 启动web ui服务,默认端口8500
  • -server 服务器模式(开发环境可以使用-dev,不写入磁盘,内存存储)
  • -data-dir 数据存储路径
  • -config-dir Service配置文件目录,通常为./consul.d
  • -config-file Service配置文件路径,该选项可以配置多次
  • -pid-file pid文件路径
  • -client 指定节点为client
  • -advertise
  • -join 将节点加入到集群
  • -node 指定节点在集群中的名称
Usage:
  -advertise value
        Sets the advertise address to use.
  -advertise-wan value
        Sets address to advertise on WAN instead of -advertise address.
  -allow-write-http-from value
        Only allow write endpoint calls from given network. CIDR format, can be specified multiple times.
  -alt-domain value
        Alternate domain to use for DNS interface.
  -bind value
        Sets the bind address for cluster communication.
  -bootstrap
        Sets server to bootstrap mode.
  -bootstrap-expect value
        Sets server to expect bootstrap mode.
  -check_output_max_size value
        Sets the maximum output size for checks on this agent
  -client value
        Sets the address to bind for client access. This includes RPC, DNS, HTTP, HTTPS and gRPC (if configured).
  -config-dir value
        Path to a directory to read configuration files from. This will read every file ending in '.json' as configuration in this directory in alphabetical order. Can be specified multiple times.
  -config-file value
        Path to a file in JSON or HCL format with a matching file extension. Can be specified multiple times.
  -config-format string
        Config files are in this format irrespective of their extension. Must be 'hcl' or 'json'
  -data-dir value
        Path to a data directory to store agent state.
  -datacenter value
        Datacenter of the agent.
  -default-query-time value
        the amount of time a blocking query will wait before Consul will force a response. This value can be overridden by the 'wait' query parameter.
  -dev
        Starts the agent in development mode.
  -disable-host-node-id
        Setting this to true will prevent Consul from using information from the host to generate a node ID, and will cause Consul to generate a random node ID instead.
  -disable-keyring-file
        Disables the backing up of the keyring to a file.
  -dns-port value
        DNS port to use.
  -domain value
        Domain to use for DNS interface.
  -enable-local-script-checks
        Enables health check scripts from configuration file.
  -enable-script-checks
        Enables health check scripts.
  -encrypt value
        Provides the gossip encryption key.
  -grpc-port value
        Sets the gRPC API port to listen on (currently needed for Envoy xDS only).
  -hcl value
        hcl config fragment. Can be specified multiple times.
  -http-port value
        Sets the HTTP API port to listen on.
  -https-port value
        Sets the HTTPS API port to listen on.
  -join value
        Address of an agent to join at start time. Can be specified multiple times.
  -join-wan value
        Address of an agent to join -wan at start time. Can be specified multiple times.
  -log-file value
        Path to the file the logs get written to
  -log-json
        Output logs in JSON format.
  -log-level value
        Log level of the agent.
  -log-rotate-bytes value
        Maximum number of bytes that should be written to a log file
  -log-rotate-duration value
        Time after which log rotation needs to be performed
  -log-rotate-max-files value
        Maximum number of log file archives to keep
  -max-query-time value
        the maximum amount of time a blocking query can wait before Consul will force a response. Consul applies jitter to the wait time. The jittered time will be capped to MaxQueryTime.
  -node value
        Name of this node. Must be unique in the cluster.
  -node-id value
        A unique ID for this node across space and time. Defaults to a randomly-generated ID that persists in the data-dir.
  -node-meta key:value
        An arbitrary metadata key/value pair for this node, of the format key:value. Can be specified multiple times.
  -non-voting-server
        (Enterprise-only) This flag is used to make the server not participate in the Raft quorum, and have it only receive the data replication stream. This can be used to add read scalability to a cluster in cases where a high volume of reads to servers are needed.
  -pid-file value
        Path to file to store agent PID.
  -primary-gateway value
        Address of a mesh gateway in the primary datacenter to use to bootstrap WAN federation at start time with retries enabled. Can be specified multiple times.
  -protocol value
        Sets the protocol version. Defaults to latest.
  -raft-protocol value
        Sets the Raft protocol version. Defaults to latest.
  -recursor value
        Address of an upstream DNS server. Can be specified multiple times.
  -rejoin
        Ignores a previous leave and attempts to rejoin the cluster.
  -retry-interval value
        Time to wait between join attempts.
  -retry-interval-wan value
        Time to wait between join -wan attempts.
  -retry-join value
        Address of an agent to join at start time with retries enabled. Can be specified multiple times.
  -retry-join-wan value
        Address of an agent to join -wan at start time with retries enabled. Can be specified multiple times.
  -retry-max value
        Maximum number of join attempts. Defaults to 0, which will retry indefinitely.
  -retry-max-wan value
        Maximum number of join -wan attempts. Defaults to 0, which will retry indefinitely.
  -segment value
        (Enterprise-only) Sets the network segment to join.
  -serf-lan-allowed-cidrs value
        Networks (eg: 192.168.1.0/24) allowed for Serf LAN. Can be specified multiple times.
  -serf-lan-bind value
        Address to bind Serf LAN listeners to.
  -serf-lan-port value
        Sets the Serf LAN port to listen on.
  -serf-wan-allowed-cidrs value
        Networks (eg: 192.168.1.0/24) allowed for Serf WAN (other datacenters). Can be specified multiple times.
  -serf-wan-bind value
        Address to bind Serf WAN listeners to.
  -serf-wan-port value
        Sets the Serf WAN port to listen on.
  -server
        Switches agent to server mode.
  -server-port value
        Sets the server port to listen on.
  -syslog
        Enables logging to syslog.
  -ui
        Enables the built-in static web UI server.
  -ui-content-path value
        Sets the external UI path to a string. Defaults to: /ui/ 
  -ui-dir value
        Path to directory containing the web UI resources.

Consul Web管理页面

启动完毕后,浏览器访问http://127.0.0.1:8500地址,即可打开Consul Web管理页面。可以看到默认只有consul一个Service,后期我们注册到ConsulService都可以从页面上看到,非常直观。

Consul Web 管理页面

Docker 安装

使用Docker启动Consul单节点服务,直接获取最新版官方镜像consul:latest命令如下:

$ docker run --name consul -d -p 8500:8500 consul

启动完毕后,同上方法验证是否启动成功,这里为了方便演示,采用Docker方式启动Consul,这里的访问地址为:http://172.30.12.167:8500

API 注册服务到 Consul

接下来,我们要注册服务到Consul中,可以通过其提供的API标准接口来添加。那么先注册一个测试服务,该测试数据为本机node-exporter服务信息,服务地址及端口为node-exporter默认提供指标数据的地址,执行如下命令:

$ curl -X PUT -d '{"id": "node-exporter", "name": "node-exporter-172.30.12.167", "address": "172.30.12.167", "port": 9100, "tags": ["test"], "checks": [{"http": "http://172.30.12.167:9100/metrics", "interval": "5s"}]}'  http://172.30.12.167:8500/v1/agent/service/register

执行完毕后,刷新一下Consul Web控制台页面,可以看到成功注册到Consul中。

Consul服务注册

Consul服务详情

提一下,如果要注销掉某个服务,可以通过如下API命令操作,例如注销上边添加的node-exporter服务

$ curl -X PUT http://172.30.12.167:8500/v1/agent/service/deregister/node-exporter

配置 Prometheus 实现自动服务发现

现在Consul服务已经启动完毕,并成功注册了一个服务,接下来,我们需要配置Prometheus来使用Consul自动服务发现,目的就是能够将上边添加的服务自动发现到PrometheusTargets中,增加prometheus.yml配置如下:

- job_name: 'consul-prometheus'
  consul_sd_configs:
  - server: '172.30.12.167:8500'
    services: []

说明一下:这里需要使用consul_sd_configs来配置使用Consul服务发现类型,serverConsul的服务地址。配置完毕后,重启Prometheus服务,此时可以通过Prometheus UI页面的Targets下查看是否配置成功。

Prometheus Targets配置

可以看到,在Targets中能够成功的自动发现Consul中的Services信息,后期需要添加新的Targets时,只需要通过APIConsul中注册服务即可,Prometheus就能自动发现该服务,是不是很方便。

不过,我们会发现有如下几个问题:

(1)会发现Prometheus同时加载出了默认服务consul,这个是不需要的
(2)默认只显示jobinstance两个标签,其他标签都默认属于before relabeling下,有些必要的服务信息,也想要在标签中展示,该如何操作呢?
(3)如果需要自定义一些标签,例如teamgroupproject等关键分组信息,方便后边alertmanager进行告警规则匹配,该如何处理呢?
(4)所有Consul中注册的Service都会默认加载到Prometheus下配置的consul_prometheus组,如果有多种类型的exporter,如何在Prometheus中配置分配给指定类型的组,方便直观的区别它们?

以上问题,我们可以通过Prometheus配置中的relabel_configs参数来解决。

配置 relabel_configs 实现自定义标签及分类

relabel_configs功能

我们先来普及一下relabel_configs的功能,Prometheus允许用户在采集任务设置中,通过relabel_configs来添加自定义的Relabeling等过程,来对标签进行指定规则的重写。Prometheus加载Targets后,这些Targets会自动包含一些默认的标签,Target__作为前置的标签是在系统内部使用的,这些标签不会被写入到样本数据中。眼尖的会发现,每次增加Target时会自动增加一个instance标签,而instance标签的内容刚好对应Target实例的__address__值,这是因为实际上Prometheus内部做了一次标签重写处理,默认__address__标签设置为<host>:<port>地址,经过标签重写后,默认会自动将该值设置为instance标签,所以我们能够在页面看到该标签。

Prometheus Targets标签

详细relabel_configs配置及说明可以参考 relabel_config 官网说明,这里简单列举一下里面每个relabel_action的作用,方便下边演示。

  • replace: 根据regex的配置匹配source_labels标签的值(注意:多个source_label的值会按照separator进行拼接),并且将匹配到的值写入到target_label当中,如果有多个匹配组,则可以使用${1}, ${2}确定写入的内容。如果没匹配到任何内容则不对target_label进行重新,默认为replace
  • keep: 丢弃source_labels的值中没有匹配到regex正则表达式内容的Target实例
  • drop: 丢弃source_labels的值中匹配到regex正则表达式内容的Target实例
  • hashmod: 将target_label设置为关联的source_label的哈希模块
  • labelmap: 根据regex去匹配Target实例所有标签的名称(注意是名称),并且将捕获到的内容作为为新的标签名称,regex匹配到标签的的值作为新标签的值
  • labeldrop: 对Target标签进行过滤,会移除匹配过滤条件的所有标签
  • labelkeep: 对Target标签进行过滤,会移除不匹配过滤条件的所有标签

接下来,我们来挨个处理上述问题。

通过标签过滤服务

问题一,我们可以配置relabel_configs来实现标签过滤,只加载符合规则的服务。以上边为例,可以通过过滤__meta_consul_tags标签为test的服务,relabel_configConsul注册服务的时候,只加载匹配regex表达式的标签的服务到自己的配置文件。修改prometheus.yml配置如下:

- job_name: 'consul-prometheus'
  consul_sd_configs:
    - server: '172.30.12.167:8500'
      services: []  
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: .*test.*
      action: keep

解释下,这里的relabel_configs配置作用为丢弃源标签中__meta_consul_tags不包含test标签的服务,__meta_consul_tags对应到Consul服务中的值为"tags": ["test"],默认consul服务是不带该标签的,从而实现过滤。重启Prometheus可以看到现在只获取了node-exporter-172.30.12.167这个服务了。

Prometheus Targets标签过滤

自定义可视化标签

问题二和问题三可以归为一类,就是将系统默认标签或者用户自定义标签转换成可视化标签,方便查看及后续Alertmanager进行告警规则匹配分组。不过要实现给服务添加自定义标签,我们还得做一下修改,就是在注册服务时,将自定义标签信息添加到Meta Data数据中,具体可以参考 Consul Service - Agent HTTP API 官网说明,下边来演示一下如何操作。

新建consul-0.json如下:

$ vim consul-0.json
{
  "ID": "node-exporter",
  "Name": "node-exporter-172.30.12.167",
  "Tags": [
    "test"
  ],
  "Address": "172.30.12.167",
  "Port": 9100,
  "Meta": {
    "app": "spring-boot",
    "team": "appgroup",
    "project": "bigdata"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://172.30.12.167:9100/metrics",
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}

说明一下:该Json文件为要注册的服务信息,同时往Meta信息中添加了app=spring-bootteam=appgroupproject=bigdata三组标签,目的就是为了方便告警分组使用。执行如下命令进行注册:

$ curl --request PUT --data @consul-0.json http://172.30.12.167:8500/v1/agent/service/register?replace-existing-checks=1

注册完毕,通过Consul Web管理页面可以查看到已注册成功,并且包含了Meta信息。

Consul服务注册(Meta Data)

然后修改prometheus.yml配置如下:

- job_name: 'consul-prometheus'
  consul_sd_configs:
    - server: '172.30.12.167:8500'
      services: []  
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: .*test.*
      action: keep
    - regex: __meta_consul_service_metadata_(.+)
      action: labelmap

解释一下,增加的配置作用为匹配__meta_consul_service_metadata开头的标签,将捕获到的内容作为新的标签名称,匹配到标签的值作为新标签的值,而我们刚添加的三个自定义标签,系统会自动添加__meta_consul_service_metadata_app=spring-boot__meta_consul_service_metadata_team=appgroup__meta_consul_service_metadata_project=bigdata三个标签,经过relabel后,Prometheus将会新增app=spring-bootteam=appgroupproject=bigdata三个标签。重启Prometheus服务,可以看到新增了对应了三个自定义标签。

Prometheus Targets自定义标签

服务分类

问题四,将自动发现的服务进行分类,本质上跟上边的处理方式一致,可以添加自定义的标签方式,通过标签来区分,二可以通过服务Tag来进行匹配来创建不同的类型exporter分组。这里我以第二种为例,通过给每个服务标记不同的Tag,然后通过relabel_configs来进行匹配区分。我们来更新一下原node-exporter-172.30.12.167服务标签,同时注册一个其他类型exporter的服务如下:

$ vim consul-1.json
{
  "ID": "node-exporter",
  "Name": "node-exporter-172.30.12.167",
  "Tags": [
    "node-exporter"
  ],
  "Address": "172.30.12.167",
  "Port": 9100,
  "Meta": {
    "app": "spring-boot",
    "team": "appgroup",
    "project": "bigdata"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://172.30.12.167:9100/metrics",
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}

# 更新注册服务
$ curl --request PUT --data @consul-1.json http://172.30.12.167:8500/v1/agent/service/register?replace-existing-checks=1

$ vim consul-2.json
{
  "ID": "cadvisor-exporter",
  "Name": "cadvisor-exporter-172.30.12.167",
  "Tags": [
    "cadvisor-exporter"
  ],
  "Address": "172.30.12.167",
  "Port": 8080,
  "Meta": {
    "app": "docker",
    "team": "cloudgroup",
    "project": "docker-service"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://172.30.12.167:8080/metrics",
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}

# 注册服务
$ curl --request PUT --data @consul-2.json http://172.30.12.167:8500/v1/agent/service/register?replace-existing-checks=1

说明一下,我们更新了原node-exporter-172.30.12.167服务的标签为node-exporter,同时注册一个新类型cadvisor-exporter-172.30.12.167服务,并设置标签为cadvisor-exporter,以示区别。注册完毕,通过Consul Web控制台可以看到成功注册了这两个服务。

Consul服务更新

最后,我们修改prometheus.yml配置如下:

- job_name: 'consul-node-exporter'
  consul_sd_configs:
    - server: '172.30.12.167:8500'
      services: []  
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: .*node-exporter.*
      action: keep
    - regex: __meta_consul_service_metadata_(.+)
      action: labelmap

- job_name: 'consul-cadvisor-exproter'
  consul_sd_configs:
    - server: '172.30.12.167:8500'
      services: []
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: .*cadvisor-exporter.*
      action: keep
    - regex: __meta_consul_service_metadata_(.+)
      action: labelmap

这里需要根据每种类型的exporter新增一个关联job,同时relabel_configs中配置以Tag来做匹配区分。重启Prometheus服务,可以看到服务已经按照类型分类了,方便查看。

Consul服务分类

参考资料

版权声明:
作者:Joe.Ye
链接:https://www.appblog.cn/index.php/2023/03/25/prometheus-uses-consul-to-achieve-automatic-service-discovery/
来源:APP全栈技术分享
文章版权归作者所有,未经允许请勿转载。

THE END
分享
二维码
打赏
海报
Prometheus使用Consul实现自动服务发现
Consul 介绍 Consul 是基于 GO 语言开发的开源工具,主要面向分布式,服务化的系统提供服务注册、服务发现和配置管理的功能。Consul 提供服务注册/发现、健康……
<<上一篇
下一篇>>
文章目录
关闭
目 录