[干货] 记一次Kibana报告生成失败的排查过程

说明

本文描述问题及解决方法同样适用于 腾讯云 Elasticsearch Service(ES)

背景

Kibana 中的 Dashboard 给我们直观的数据展示。在实际的工作中,可以用于汇报。在 Kibana 中,我们可以来生成我们想要的Report。

问题

在生成报表时失败,报错:

Can't reach the server. Please try agin.

打开F12,返回的是内部错误。

问题比较奇怪,需要深入分析一下。

问题原因

一、分析kibana异常日志

打开kibana的日志,经过分析,我们发现了异常的地方:

"message":"[illegal_argument_exception] Rejecting mapping update to [.reporting-2021.10.24] as the final mapping would have more than 1 type: [esqueue, doc]"}
{"type":"response","@timestamp":"2021-10-27T05:53:12Z","tags":["api"],"pid":14595,"method":"post","statusCode":500,"req":{"url":"/api/reporting/generate/csv?jobParams=(conflictedTypesFields%3A!(kfext%2Ckfuin%2CrequestId)%2Cfields%3A!('%40timestamp'%2Ctext)%2CindexPatternId%3A'21fe4820-8916-11ea-8b39-a19e11c4dfcb'%2CmetaFields%3A!(_source%2C_id%2C_type%2C_index%2C_score)%2CsearchRequest%3A(body%3A(_source%3A(excludes%3A!()%2Cincludes%3A!('%40timestamp'%2Ctext))%2Cdocvalue_fields%3A!()%2Cquery%3A(bool%3A(filter%3A!()%2Cmust%3A!((query_string%3A(analyze_wildcard%3A!t%2Cdefault_field%3A'*'%2Cquery%3A'%22high%20risky%20with%20req%22'))%2C(range%3A('%40timestamp'%3A(format%3Aepoch_millis%2Cgte%3A1635310298396%2Clte%3A1635313898396))))%2Cmust_not%3A!()%2Cshould%3A!()))%2Cscript_fields%3A()%2Csort%3A!(('%40timestamp'%3A(order%3Adesc%2Cunmapped_type%3Aboolean)))%2Cstored_fields%3A!('%40timestamp'%2Ctext)%2Cversion%3A!t)%2Cindex%3A'account-admin-ol-*')%2Ctitle%3A'high%20risky%20with%20req'%2Ctype%3Asearch)","method":"post","headers":{"host":"kibana","connection":"close","content-length":"0","x-stgw-time":"1635313992.732","x-client-proto":"https","x-forwarded-proto":"https","x-client-proto-ver":"HTTP/2.0","x-real-ip":"116.233.19.162","x-forwarded-for":"116.233.19.162","sec-ch-ua":"\"Chromium\";v=\"92\", \" Not A;Brand\";v=\"99\", \"Google Chrome\";v=\"92\"","sec-ch-ua-mobile":"?0","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36","kbn-version":"6.8.2","content-type":"application/json","accept":"*/*","origin":"https://es-3ktojklt.kibana.tencentelasticsearch.com:5601","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://es-3ktojklt.kibana.tencentelasticsearch.com:5601/app/kibana","accept-encoding":"gzip, deflate, br","accept-language":"zh-CN,zh;q=0.9,en;q=0.8"},"remoteAddress":"10.0.130.254","userAgent":"10.0.130.254","referer":"https://es-3ktojklt.kibana.tencentelasticsearch.com:5601/app/kibana"},"res":{"statusCode":500,"responseTime":4695,"contentLength":9},"message":"POST /api/reporting/generate/csv?jobParams=(conflictedTypesFields%3A!(kfext%2Ckfuin%2CrequestId)%2Cfields%3A!('%40timestamp'%2Ctext)%2CindexPatternId%3A'21fe4820-8916-11ea-8b39-a19e11c4dfcb'%2CmetaFields%3A!(_source%2C_id%2C_type%2C_index%2C_score)%2CsearchRequest%3A(body%3A(_source%3A(excludes%3A!()%2Cincludes%3A!('%40timestamp'%2Ctext))%2Cdocvalue_fields%3A!()%2Cquery%3A(bool%3A(filter%3A!()%2Cmust%3A!((query_string%3A(analyze_wildcard%3A!t%2Cdefault_field%3A'*'%2Cquery%3A'%22high%20risky%20with%20req%22'))%2C(range%3A('%40timestamp'%3A(format%3Aepoch_millis%2Cgte%3A1635310298396%2Clte%3A1635313898396))))%2Cmust_not%3A!()%2Cshould%3A!()))%2Cscript_fields%3A()%2Csort%3A!(('%40timestamp'%3A(order%3Adesc%2Cunmapped_type%3Aboolean)))%2Cstored_fields%3A!('%40timestamp'%2Ctext)%2Cversion%3A!t)%2Cindex%3A'account-admin-ol-*')%2Ctitle%3A'high%20risky%20with%20req'%2Ctype%3Asearch) 500 4695ms - 9.0B"}

核心错误在于:

[.reporting-2021.10.24] as the final mapping would have more than 1 type: [esqueue, doc]

版本问题?

为什么会有这种问题呢,系统索引出现这种故障无非是kibana与es的版本不一致所导致,check了一下:

[root@VM_130_254_centos /usr/local/service/kibana]# more version.md 
6.8.2.2019121001
[root@VM_130_254_centos /usr/local/service/kibana]# cur localhost:9200
{
  "name" : "1620648141000429932",
  "cluster_name" : "es-3ktojklt",
  "cluster_uuid" : "zH1tb_eUS5uHJf5edamMAg",
  "version" : {
    "number" : "6.8.2",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "f1ae577",
    "build_date" : "2019-11-25T13:31:48.079152Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

版本完全一致,不是版本的问题,排除这个可能。

二、分析Elasticsearch日志

查到这里,大概率就是mapping的问题了,但一般谁会去改动系统索引的mapping呢,这里我怀疑是有自定义模板的mapping干扰了系统索引。

在日志里搜索系统索引:

果然发现了异常的地方:

[.reporting-2021.10.24] creating index, cause [auto(bulk api)], templates [qidian_default, default@template, qd-template, outerBoss-template, hand-nginx-template, hand-template, *, beeflow-java-template, zhiku-template, beeflow-template, test-template, $zhiku-template]

一个系统索引的创建,竟然匹配了那么多自定义模板,这肯定有问题呀。

解决方案

临时作废了这些影响系统索引的自定义模板,由原先的:

  "index_patterns": [
    "*"
  ]

改为了:

  "index_patterns": [
    "xxx*"
  ]

然后删除系统报告索引,再次生成报告,就可以正常执行了:

问题解决。

小结

业务在正常使用中,可以自定义模板来匹配实际的业务索引,这个本身没有什么问题。但是切记不可以为了方便,全部都匹配 * ,这个操作很危险,会存在隐患。

正文完