说明
本文描述问题及解决方法同样适用于 腾讯云 Elasticsearch Service(ES)。
另外使用到:腾讯云 云服务器(Cloud Virtual Machine,CVM)
本文延续上一篇 Elasticsearch压测工具esrally部署之踩坑实录(一)
环境配置
注:这套环境配置为本文验证通过的环境配置及版本,避免踩坑请尽量按照环境配置里提到的配置及版本
Esrally客户端环境
版本
Linux环境:Centos 7.9
Python:3.6.7
Pip:10.0.1 from pip (python 3.6)
Java:openjdk version 1.8.0_302 (build 1.8.0_302-b08)
Git:2.7.5
Esrally:2.3.0
配置
内存:32G
硬盘:SSD云硬盘 100GB
CPU个数:1
CPU核心数:16
Elasticsearch服务端环境
版本
Linux环境:Centos 7.2
Java:openjdk version 11.0.9.1-ga (build 11.0.9.1-ga+1, mixed mode)
Elasticsearch版本:7.10.1(腾讯云 Elasticsearch Service 白金版)
配置
节点数量:3
内存:16G
硬盘:SSD云硬盘 1TB
CPU个数:1
CPU核心数:4
CPU型号:AMD EPYC 7K62 48-Core Processor
背景
在大数据时代的今天,业务量越来越大,每天动辄都会产生上百GB、上TB的数据,所以拥有一个性能强劲的Elasticsearch集群就显得尤为重要。我们需要模拟大量网络日志、用户行为日志的读写动作,衡量各性能的指标,找出集群瓶颈所在,以确认我们需要怎样的硬件配置以及业务优化,才能满足现有的业务量,这就是我们在业务上线前所必要做的。
压测指令
esrally --distribution-version=6.0.0
压测报告
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------
| Metric | Task | Value | Unit |
|---------------------------------------------------------------:|-----------------------:|----------:|--------:|
| Cumulative indexing time of primary shards | | 31.9571 | min |
| Min cumulative indexing time across primary shards | | 6.318 | min |
| Median cumulative indexing time across primary shards | | 6.3739 | min |
| Max cumulative indexing time across primary shards | | 6.54167 | min |
| Cumulative indexing throttle time of primary shards | | 0 | min |
| Min cumulative indexing throttle time across primary shards | | 0 | min |
| Median cumulative indexing throttle time across primary shards | | 0 | min |
| Max cumulative indexing throttle time across primary shards | | 0 | min |
| Cumulative merge time of primary shards | | 8.4722 | min |
| Cumulative merge count of primary shards | | 254 | |
| Min cumulative merge time across primary shards | | 1.43585 | min |
| Median cumulative merge time across primary shards | | 1.70052 | min |
| Max cumulative merge time across primary shards | | 1.90262 | min |
| Cumulative merge throttle time of primary shards | | 1.41052 | min |
| Min cumulative merge throttle time across primary shards | | 0.19385 | min |
| Median cumulative merge throttle time across primary shards | | 0.28705 | min |
| Max cumulative merge throttle time across primary shards | | 0.33705 | min |
| Cumulative refresh time of primary shards | | 1.22168 | min |
| Cumulative refresh count of primary shards | | 402 | |
| Min cumulative refresh time across primary shards | | 0.19125 | min |
| Median cumulative refresh time across primary shards | | 0.227717 | min |
| Max cumulative refresh time across primary shards | | 0.307067 | min |
| Cumulative flush time of primary shards | | 0.865717 | min |
| Cumulative flush count of primary shards | | 10 | |
| Min cumulative flush time across primary shards | | 0.170633 | min |
| Median cumulative flush time across primary shards | | 0.1722 | min |
| Max cumulative flush time across primary shards | | 0.176117 | min |
| Total Young Gen GC | | 18.509 | s |
| Total Old Gen GC | | 4.205 | s |
| Store size | | 3.26041 | GB |
| Translog size | | 2.68671 | GB |
| Heap used for segments | | 19.758 | MB |
| Heap used for doc values | | 0.0497627 | MB |
| Heap used for terms | | 18.5803 | MB |
| Heap used for norms | | 0.0792847 | MB |
| Heap used for points | | 0.278255 | MB |
| Heap used for stored fields | | 0.770416 | MB |
| Segment count | | 103 | |
| Min Throughput | index-append | 79792.9 | docs/s |
| Median Throughput | index-append | 80220.1 | docs/s |
| Max Throughput | index-append | 80462.6 | docs/s |
| 50th percentile latency | index-append | 431.806 | ms |
| 90th percentile latency | index-append | 617.963 | ms |
| 99th percentile latency | index-append | 1409.56 | ms |
| 100th percentile latency | index-append | 1668.42 | ms |
| 50th percentile service time | index-append | 431.806 | ms |
| 90th percentile service time | index-append | 617.963 | ms |
| 99th percentile service time | index-append | 1409.56 | ms |
| 100th percentile service time | index-append | 1668.42 | ms |
| error rate | index-append | 0 | % |
| Min Throughput | index-stats | 90.05 | ops/s |
| Median Throughput | index-stats | 90.08 | ops/s |
| Max Throughput | index-stats | 90.15 | ops/s |
| 50th percentile latency | index-stats | 1.15388 | ms |
| 90th percentile latency | index-stats | 1.44819 | ms |
| 99th percentile latency | index-stats | 1.68309 | ms |
| 99.9th percentile latency | index-stats | 6.77088 | ms |
| 100th percentile latency | index-stats | 10.6119 | ms |
| 50th percentile service time | index-stats | 1.08615 | ms |
| 90th percentile service time | index-stats | 1.37542 | ms |
| 99th percentile service time | index-stats | 1.59919 | ms |
| 99.9th percentile service time | index-stats | 6.70006 | ms |
| 100th percentile service time | index-stats | 10.5416 | ms |
| error rate | index-stats | 0 | % |
| Min Throughput | node-stats | 90.07 | ops/s |
| Median Throughput | node-stats | 90.13 | ops/s |
| Max Throughput | node-stats | 90.44 | ops/s |
| 50th percentile latency | node-stats | 1.2836 | ms |
| 90th percentile latency | node-stats | 1.66455 | ms |
| 99th percentile latency | node-stats | 2.76618 | ms |
| 99.9th percentile latency | node-stats | 6.3469 | ms |
| 100th percentile latency | node-stats | 8.49078 | ms |
| 50th percentile service time | node-stats | 1.20947 | ms |
| 90th percentile service time | node-stats | 1.59321 | ms |
| 99th percentile service time | node-stats | 2.7005 | ms |
| 99.9th percentile service time | node-stats | 6.27536 | ms |
| 100th percentile service time | node-stats | 8.42212 | ms |
| error rate | node-stats | 0 | % |
| Min Throughput | default | 50.02 | ops/s |
| Median Throughput | default | 50.03 | ops/s |
| Max Throughput | default | 50.06 | ops/s |
| 50th percentile latency | default | 7.33918 | ms |
| 90th percentile latency | default | 7.80708 | ms |
| 99th percentile latency | default | 10.8431 | ms |
| 99.9th percentile latency | default | 11.3341 | ms |
| 100th percentile latency | default | 11.3709 | ms |
| 50th percentile service time | default | 7.26797 | ms |
| 90th percentile service time | default | 7.73792 | ms |
| 99th percentile service time | default | 10.7737 | ms |
| 99.9th percentile service time | default | 11.2504 | ms |
| 100th percentile service time | default | 11.3026 | ms |
| error rate | default | 0 | % |
| Min Throughput | term | 100.06 | ops/s |
| Median Throughput | term | 100.09 | ops/s |
| Max Throughput | term | 100.18 | ops/s |
| 50th percentile latency | term | 1.23537 | ms |
| 90th percentile latency | term | 1.49826 | ms |
| 99th percentile latency | term | 1.64551 | ms |
| 99.9th percentile latency | term | 4.5177 | ms |
| 100th percentile latency | term | 4.76169 | ms |
| 50th percentile service time | term | 1.1639 | ms |
| 90th percentile service time | term | 1.42085 | ms |
| 99th percentile service time | term | 1.56379 | ms |
| 99.9th percentile service time | term | 4.44861 | ms |
| 100th percentile service time | term | 4.67859 | ms |
| error rate | term | 0 | % |
| Min Throughput | phrase | 110.07 | ops/s |
| Median Throughput | phrase | 110.1 | ops/s |
| Max Throughput | phrase | 110.18 | ops/s |
| 50th percentile latency | phrase | 1.0436 | ms |
| 90th percentile latency | phrase | 1.33577 | ms |
| 99th percentile latency | phrase | 1.50581 | ms |
| 99.9th percentile latency | phrase | 4.61771 | ms |
| 100th percentile latency | phrase | 4.68649 | ms |
| 50th percentile service time | phrase | 0.970911 | ms |
| 90th percentile service time | phrase | 1.25612 | ms |
| 99th percentile service time | phrase | 1.43724 | ms |
| 99.9th percentile service time | phrase | 4.53463 | ms |
| 100th percentile service time | phrase | 4.61535 | ms |
| error rate | phrase | 0 | % |
| Min Throughput | country_agg_uncached | 3.61 | ops/s |
| Median Throughput | country_agg_uncached | 3.61 | ops/s |
| Max Throughput | country_agg_uncached | 3.61 | ops/s |
| 50th percentile latency | country_agg_uncached | 123.416 | ms |
| 90th percentile latency | country_agg_uncached | 132.835 | ms |
| 99th percentile latency | country_agg_uncached | 137.358 | ms |
| 100th percentile latency | country_agg_uncached | 162.226 | ms |
| 50th percentile service time | country_agg_uncached | 123.258 | ms |
| 90th percentile service time | country_agg_uncached | 132.674 | ms |
| 99th percentile service time | country_agg_uncached | 137.211 | ms |
| 100th percentile service time | country_agg_uncached | 162.04 | ms |
| error rate | country_agg_uncached | 0 | % |
| Min Throughput | country_agg_cached | 100.05 | ops/s |
| Median Throughput | country_agg_cached | 100.06 | ops/s |
| Max Throughput | country_agg_cached | 100.09 | ops/s |
| 50th percentile latency | country_agg_cached | 0.773497 | ms |
| 90th percentile latency | country_agg_cached | 0.972381 | ms |
| 99th percentile latency | country_agg_cached | 1.19931 | ms |
| 99.9th percentile latency | country_agg_cached | 1.63762 | ms |
| 100th percentile latency | country_agg_cached | 3.01838 | ms |
| 50th percentile service time | country_agg_cached | 0.69294 | ms |
| 90th percentile service time | country_agg_cached | 0.888465 | ms |
| 99th percentile service time | country_agg_cached | 1.11699 | ms |
| 99.9th percentile service time | country_agg_cached | 1.5571 | ms |
| 100th percentile service time | country_agg_cached | 2.93895 | ms |
| error rate | country_agg_cached | 0 | % |
| Min Throughput | scroll | 20.05 | pages/s |
| Median Throughput | scroll | 20.06 | pages/s |
| Max Throughput | scroll | 20.08 | pages/s |
| 50th percentile latency | scroll | 295.173 | ms |
| 90th percentile latency | scroll | 301.86 | ms |
| 99th percentile latency | scroll | 304.547 | ms |
| 100th percentile latency | scroll | 305.208 | ms |
| 50th percentile service time | scroll | 294.264 | ms |
| 90th percentile service time | scroll | 300.968 | ms |
| 99th percentile service time | scroll | 303.718 | ms |
| 100th percentile service time | scroll | 304.351 | ms |
| error rate | scroll | 0 | % |
| Min Throughput | expression | 2 | ops/s |
| Median Throughput | expression | 2 | ops/s |
| Max Throughput | expression | 2 | ops/s |
| 50th percentile latency | expression | 258.146 | ms |
| 90th percentile latency | expression | 270.362 | ms |
| 99th percentile latency | expression | 285.048 | ms |
| 100th percentile latency | expression | 305.866 | ms |
| 50th percentile service time | expression | 257.889 | ms |
| 90th percentile service time | expression | 270.114 | ms |
| 99th percentile service time | expression | 284.788 | ms |
| 100th percentile service time | expression | 305.616 | ms |
| error rate | expression | 0 | % |
| Min Throughput | painless_static | 1.4 | ops/s |
| Median Throughput | painless_static | 1.4 | ops/s |
| Max Throughput | painless_static | 1.4 | ops/s |
| 50th percentile latency | painless_static | 891.686 | ms |
| 90th percentile latency | painless_static | 989.863 | ms |
| 99th percentile latency | painless_static | 1075.74 | ms |
| 100th percentile latency | painless_static | 1091.99 | ms |
| 50th percentile service time | painless_static | 705.613 | ms |
| 90th percentile service time | painless_static | 775.687 | ms |
| 99th percentile service time | painless_static | 827.929 | ms |
| 100th percentile service time | painless_static | 835.33 | ms |
| error rate | painless_static | 0 | % |
| Min Throughput | painless_dynamic | 1.4 | ops/s |
| Median Throughput | painless_dynamic | 1.4 | ops/s |
| Max Throughput | painless_dynamic | 1.4 | ops/s |
| 50th percentile latency | painless_dynamic | 708.353 | ms |
| 90th percentile latency | painless_dynamic | 783.711 | ms |
| 99th percentile latency | painless_dynamic | 852.798 | ms |
| 100th percentile latency | painless_dynamic | 855.139 | ms |
| 50th percentile service time | painless_dynamic | 679.176 | ms |
| 90th percentile service time | painless_dynamic | 752.427 | ms |
| 99th percentile service time | painless_dynamic | 822.407 | ms |
| 100th percentile service time | painless_dynamic | 828.349 | ms |
| error rate | painless_dynamic | 0 | % |
| Min Throughput | large_terms | 1.1 | ops/s |
| Median Throughput | large_terms | 1.1 | ops/s |
| Max Throughput | large_terms | 1.1 | ops/s |
| 50th percentile latency | large_terms | 384.18 | ms |
| 90th percentile latency | large_terms | 392.977 | ms |
| 99th percentile latency | large_terms | 404.511 | ms |
| 100th percentile latency | large_terms | 414.145 | ms |
| 50th percentile service time | large_terms | 383.635 | ms |
| 90th percentile service time | large_terms | 392.434 | ms |
| 99th percentile service time | large_terms | 403.958 | ms |
| 100th percentile service time | large_terms | 413.6 | ms |
| error rate | large_terms | 0 | % |
| Min Throughput | large_filtered_terms | 1.1 | ops/s |
| Median Throughput | large_filtered_terms | 1.1 | ops/s |
| Max Throughput | large_filtered_terms | 1.1 | ops/s |
| 50th percentile latency | large_filtered_terms | 384.729 | ms |
| 90th percentile latency | large_filtered_terms | 391.508 | ms |
| 99th percentile latency | large_filtered_terms | 396.777 | ms |
| 100th percentile latency | large_filtered_terms | 402.892 | ms |
| 50th percentile service time | large_filtered_terms | 384.253 | ms |
| 90th percentile service time | large_filtered_terms | 390.981 | ms |
| 99th percentile service time | large_filtered_terms | 396.297 | ms |
| 100th percentile service time | large_filtered_terms | 402.423 | ms |
| error rate | large_filtered_terms | 0 | % |
| Min Throughput | large_prohibited_terms | 1.1 | ops/s |
| Median Throughput | large_prohibited_terms | 1.1 | ops/s |
| Max Throughput | large_prohibited_terms | 1.1 | ops/s |
| 50th percentile latency | large_prohibited_terms | 383.206 | ms |
| 90th percentile latency | large_prohibited_terms | 390.933 | ms |
| 99th percentile latency | large_prohibited_terms | 394.907 | ms |
| 100th percentile latency | large_prohibited_terms | 399.321 | ms |
| 50th percentile service time | large_prohibited_terms | 382.704 | ms |
| 90th percentile service time | large_prohibited_terms | 390.658 | ms |
| 99th percentile service time | large_prohibited_terms | 394.357 | ms |
| 100th percentile service time | large_prohibited_terms | 398.779 | ms |
| error rate | large_prohibited_terms | 0 | % |
----------------------------------
[INFO] SUCCESS (took 2583 seconds)
----------------------------------