V2CE – Elasticsearch 7.10.1集群压测报告(16核64G*3 SSD云盘,AMD)

说明

本文描述问题及解决方法同样适用于 腾讯云 Elasticsearch Service(ES)

另外使用到:腾讯云 云服务器(Cloud Virtual Machine,CVM)

本文延续上一篇 Elasticsearch压测工具esrally部署指南

环境配置

Esrally客户端环境

  • 版本

Linux环境:Centos 7.9

Python:3.8.7

Pip:pip 20.2.3 from pip (python 3.8)

Java:openjdk version 1.8.0_302 (build 1.8.0_302-b08)

Git:2.7.5

Esrally:2.3.0

  • 配置

内存:32G

硬盘:SSD云硬盘 100GB

CPU个数:1

CPU核心数:16

Elasticsearch服务端环境

  • 版本

Linux环境:Centos 7.2

Java:openjdk version 11.0.9.1-ga (build 11.0.9.1-ga+1, mixed mode)

Elasticsearch版本:7.10.1(腾讯云 Elasticsearch Service 白金版)

  • 配置

节点数量:3

内存:64G

硬盘:SSD云硬盘 3.5TB

CPU个数:1

CPU核心数:16

CPU型号:AMD EPYC 7K62 48-Core Processor

背景

在大数据时代的今天,业务量越来越大,每天动辄都会产生上百GB、上TB的数据,所以拥有一个性能强劲的Elasticsearch集群就显得尤为重要。我们需要模拟大量网络日志、用户行为日志的读写动作,衡量各性能的指标,找出集群瓶颈所在,以确认我们需要怎样的硬件配置以及业务优化,才能满足现有的业务量,这就是我们在业务上线前所必要做的。

压测

esrally 相关术语及参数

Rally 是汽车拉力赛的意思,所以关于它里面术语也是跟汽车的拉力赛有关。

  • track: 即赛道的意思,这里指压测用到的样本数据和压测策略,使用 esrally list tracks 列出。rally 自带的 track 可在 https://github.com/elastic/rally-tracks 中查看,每个 track 的文件名中都存在 README.md 对压测的数据类型和参数做了详细的说明。如果没有指定 track, 则默认使用 geonames track 进行测试;
  • target-hosts:即远程elasticsearch的ip和端口,以ip:port的形式指定;
  • pipeline: 指一个压测流程,可以通过 esrally list pipeline 查看,其中有一个 benchmark-only 的流程,就是将 es 的管理交给用户来操作,rally 只用来做压测,如果你想针对已有的 es 进行压测,则使用该模式;
  • track-params:对默认的压测参数进行覆盖;
  • user-tag:本次压测的 tag 标记;
  • client-options:指定一些客户端连接选项,比如用户名和密码。

压测指令

esrally race \
  --track=geonames \
  --target-hosts=10.0.10.4:9200 \
  --pipeline=benchmark-only \
  --track-params="number_of_shards:3, number_of_replicas:1" \
  --user-tag="version:AMD_16C64G_1T*3" \
  --client-options="basic_auth_user:'elastic', basic_auth_password:'your_password'"

压测报告

压测指标

压测任务

压测结果

单位

Cumulative indexing time of primary
shards

12.13692

min

Min cumulative indexing time across
primary shards

0

min

Median cumulative indexing time across
primary shards

0.003733

min

Max cumulative indexing time across
primary shards

4.069

min

Cumulative indexing throttle time of
primary shards

0

min

Min cumulative indexing throttle time
across primary shards

0

min

Median cumulative indexing throttle time
across primary shards

0

min

Max cumulative indexing throttle time
across primary shards

0

min

Cumulative merge time of primary shards

3.274033

min

Cumulative merge count of primary shards

151

Min cumulative merge time across primary
shards

0

min

Median cumulative merge time across
primary shards

0.002983

min

Max cumulative merge time across primary
shards

1.136567

min

Cumulative merge throttle time of
primary shards

1.212417

min

Min cumulative merge throttle time
across primary shards

0

min

Median cumulative merge throttle time
across primary shards

0

min

Max cumulative merge throttle time
across primary shards

0.483167

min

Cumulative refresh time of primary
shards

0.815683

min

Cumulative refresh count of primary
shards

1254

Min cumulative refresh time across
primary shards

0

min

Median cumulative refresh time across
primary shards

0.01955

min

Max cumulative refresh time across
primary shards

0.253017

min

Cumulative flush time of primary shards

0.311517

min

Cumulative flush count of primary shards

15

Min cumulative flush time across primary
shards

0

min

Median cumulative flush time across
primary shards

0

min

Max cumulative flush time across primary
shards

0.1125

min

Total Young Gen GC time

7.285

s

Total Young Gen GC count

633

Total Old Gen GC time

0

s

Total Old Gen GC count

0

Store size

6.108621

GB

Translog size

0.027325

GB

Heap used for segments

1.162872

MB

Heap used for doc values

0.234993

MB

Heap used for terms

0.814423

MB

Heap used for norms

0.048035

MB

Heap used for points

0

MB

Heap used for stored fields

0.065422

MB

Segment count

100

error rate

index-append

0

%

Min Throughput

index-stats

90.01

ops/s

Mean Throughput

index-stats

90.02

ops/s

Median Throughput

index-stats

90.02

ops/s

Max Throughput

index-stats

90.04

ops/s

50th percentile latency

index-stats

3.172824

ms

90th percentile latency

index-stats

3.65115

ms

99th percentile latency

index-stats

4.334665

ms

99.9th percentile latency

index-stats

13.4022

ms

100th percentile latency

index-stats

17.7899

ms

50th percentile service time

index-stats

2.447953

ms

90th percentile service time

index-stats

2.755487

ms

99th percentile service time

index-stats

3.491778

ms

99.9th percentile service time

index-stats

12.41878

ms

100th percentile service time

index-stats

17.38639

ms

error rate

index-stats

0

%

Min Throughput

node-stats

89.95

ops/s

Mean Throughput

node-stats

89.98

ops/s

Median Throughput

node-stats

89.99

ops/s

Max Throughput

node-stats

90

ops/s

50th percentile latency

node-stats

3.296191

ms

90th percentile latency

node-stats

3.799131

ms

99th percentile latency

node-stats

4.786813

ms

99.9th percentile latency

node-stats

5.793665

ms

100th percentile latency

node-stats

6.562565

ms

50th percentile service time

node-stats

2.566179

ms

90th percentile service time

node-stats

2.923253

ms

99th percentile service time

node-stats

4.226492

ms

99.9th percentile service time

node-stats

4.729633

ms

100th percentile service time

node-stats

5.728788

ms

error rate

node-stats

0

%

Min Throughput

default

50.02

ops/s

Mean Throughput

default

50.03

ops/s

Median Throughput

default

50.02

ops/s

Max Throughput

default

50.04

ops/s

50th percentile latency

default

3.689626

ms

90th percentile latency

default

4.217221

ms

99th percentile latency

default

5.385959

ms

99.9th percentile latency

default

8.354142

ms

100th percentile latency

default

9.191337

ms

50th percentile service time

default

3.050713

ms

90th percentile service time

default

3.320561

ms

99th percentile service time

default

3.761687

ms

99.9th percentile service time

default

7.875555

ms

100th percentile service time

default

8.580872

ms

error rate

default

0

%

Min Throughput

term

99.94

ops/s

Mean Throughput

term

99.96

ops/s

Median Throughput

term

99.96

ops/s

Max Throughput

term

99.97

ops/s

50th percentile latency

term

3.069355

ms

90th percentile latency

term

3.509827

ms

99th percentile latency

term

3.933137

ms

99.9th percentile latency

term

7.350553

ms

100th percentile latency

term

14.43949

ms

50th percentile service time

term

2.355179

ms

90th percentile service time

term

2.628199

ms

99th percentile service time

term

2.916559

ms

99.9th percentile service time

term

6.22785

ms

100th percentile service time

term

13.98078

ms

error rate

term

0

%

Min Throughput

phrase

109.8

ops/s

Mean Throughput

phrase

109.88

ops/s

Median Throughput

phrase

109.89

ops/s

Max Throughput

phrase

109.92

ops/s

50th percentile latency

phrase

3.124941

ms

90th percentile latency

phrase

3.576734

ms

99th percentile latency

phrase

5.013414

ms

99.9th percentile latency

phrase

14.82175

ms

100th percentile latency

phrase

20.64022

ms

50th percentile service time

phrase

2.426618

ms

90th percentile service time

phrase

2.685807

ms

99th percentile service time

phrase

3.2294

ms

99.9th percentile service time

phrase

14.31686

ms

100th percentile service time

phrase

14.74916

ms

error rate

phrase

0

%

Min Throughput

country_agg_uncached

3

ops/s

Mean Throughput

country_agg_uncached

3

ops/s

Median Throughput

country_agg_uncached

3

ops/s

Max Throughput

country_agg_uncached

3

ops/s

50th percentile latency

country_agg_uncached

214.9744

ms

90th percentile latency

country_agg_uncached

219.1983

ms

99th percentile latency

country_agg_uncached

230.8336

ms

100th percentile latency

country_agg_uncached

236.5429

ms

50th percentile service time

country_agg_uncached

214.0287

ms

90th percentile service time

country_agg_uncached

218.2245

ms

99th percentile service time

country_agg_uncached

230.3793

ms

100th percentile service time

country_agg_uncached

235.6672

ms

error rate

country_agg_uncached

0

%

Min Throughput

country_agg_cached

97.99

ops/s

Mean Throughput

country_agg_cached

98.52

ops/s

Median Throughput

country_agg_cached

98.57

ops/s

Max Throughput

country_agg_cached

98.89

ops/s

50th percentile latency

country_agg_cached

2.276356

ms

90th percentile latency

country_agg_cached

3.572794

ms

99th percentile latency

country_agg_cached

3.872698

ms

99.9th percentile latency

country_agg_cached

5.008591

ms

100th percentile latency

country_agg_cached

6.143569

ms

50th percentile service time

country_agg_cached

1.541434

ms

90th percentile service time

country_agg_cached

1.754622

ms

99th percentile service time

country_agg_cached

2.226828

ms

99.9th percentile service time

country_agg_cached

4.050017

ms

100th percentile service time

country_agg_cached

5.457047

ms

error rate

country_agg_cached

0

%

Min Throughput

scroll

20.03

pages/s

Mean Throughput

scroll

20.04

pages/s

Median Throughput

scroll

20.04

pages/s

Max Throughput

scroll

20.04

pages/s

50th percentile latency

scroll

609.1409

ms

90th percentile latency

scroll

618.4449

ms

99th percentile latency

scroll

635.2878

ms

100th percentile latency

scroll

661.9673

ms

50th percentile service time

scroll

607.3914

ms

90th percentile service time

scroll

616.4147

ms

99th percentile service time

scroll

634.1222

ms

100th percentile service time

scroll

659.807

ms

error rate

scroll

0

%

Min Throughput

expression

1.5

ops/s

Mean Throughput

expression

1.5

ops/s

Median Throughput

expression

1.5

ops/s

Max Throughput

expression

1.5

ops/s

50th percentile latency

expression

397.77

ms

90th percentile latency

expression

399.8934

ms

99th percentile latency

expression

405.0504

ms

100th percentile latency

expression

408.5626

ms

50th percentile service time

expression

396.573

ms

90th percentile service time

expression

398.4105

ms

99th percentile service time

expression

403.7755

ms

100th percentile service time

expression

407.1302

ms

error rate

expression

0

%

Min Throughput

painless_static

1.4

ops/s

Mean Throughput

painless_static

1.4

ops/s

Median Throughput

painless_static

1.4

ops/s

Max Throughput

painless_static

1.4

ops/s

50th percentile latency

painless_static

509.4646

ms

90th percentile latency

painless_static

518.0589

ms

99th percentile latency

painless_static

523.7899

ms

100th percentile latency

painless_static

524.1136

ms

50th percentile service time

painless_static

508.8671

ms

90th percentile service time

painless_static

517.009

ms

99th percentile service time

painless_static

522.7241

ms

100th percentile service time

painless_static

523.6501

ms

error rate

painless_static

0

%

Min Throughput

painless_dynamic

1.4

ops/s

Mean Throughput

painless_dynamic

1.4

ops/s

Median Throughput

painless_dynamic

1.4

ops/s

Max Throughput

painless_dynamic

1.4

ops/s

50th percentile latency

painless_dynamic

494.2687

ms

90th percentile latency

painless_dynamic

503.2998

ms

99th percentile latency

painless_dynamic

509.9914

ms

100th percentile latency

painless_dynamic

510.4079

ms

50th percentile service time

painless_dynamic

493.1329

ms

90th percentile service time

painless_dynamic

501.9278

ms

99th percentile service time

painless_dynamic

508.3082

ms

100th percentile service time

painless_dynamic

509.7091

ms

error rate

painless_dynamic

0

%

Min Throughput

decay_geo_gauss_function_score

1

ops/s

Mean Throughput

decay_geo_gauss_function_score

1

ops/s

Median Throughput

decay_geo_gauss_function_score

1

ops/s

Max Throughput

decay_geo_gauss_function_score

1

ops/s

50th percentile latency

decay_geo_gauss_function_score

506.0316

ms

90th percentile latency

decay_geo_gauss_function_score

508.5883

ms

99th percentile latency

decay_geo_gauss_function_score

512.9382

ms

100th percentile latency

decay_geo_gauss_function_score

515.84

ms

50th percentile service time

decay_geo_gauss_function_score

504.7985

ms

90th percentile service time

decay_geo_gauss_function_score

507.2829

ms

99th percentile service time

decay_geo_gauss_function_score

511.6621

ms

100th percentile service time

decay_geo_gauss_function_score

514.8887

ms

error rate

decay_geo_gauss_function_score

0

%

Min Throughput

decay_geo_gauss_script_score

1

ops/s

Mean Throughput

decay_geo_gauss_script_score

1

ops/s

Median Throughput

decay_geo_gauss_script_score

1

ops/s

Max Throughput

decay_geo_gauss_script_score

1

ops/s

50th percentile latency

decay_geo_gauss_script_score

523.8021

ms

90th percentile latency

decay_geo_gauss_script_score

529.3986

ms

99th percentile latency

decay_geo_gauss_script_score

555.6341

ms

100th percentile latency

decay_geo_gauss_script_score

563.2136

ms

50th percentile service time

decay_geo_gauss_script_score

522.5704

ms

90th percentile service time

decay_geo_gauss_script_score

527.1288

ms

99th percentile service time

decay_geo_gauss_script_score

554.0548

ms

100th percentile service time

decay_geo_gauss_script_score

560.7784

ms

error rate

decay_geo_gauss_script_score

0

%

Min Throughput

field_value_function_score

1.5

ops/s

Mean Throughput

field_value_function_score

1.5

ops/s

Median Throughput

field_value_function_score

1.5

ops/s

Max Throughput

field_value_function_score

1.51

ops/s

50th percentile latency

field_value_function_score

185.5918

ms

90th percentile latency

field_value_function_score

192.1556

ms

99th percentile latency

field_value_function_score

194.0047

ms

100th percentile latency

field_value_function_score

264.4487

ms

50th percentile service time

field_value_function_score

183.7263

ms

90th percentile service time

field_value_function_score

191.0908

ms

99th percentile service time

field_value_function_score

192.4972

ms

100th percentile service time

field_value_function_score

263.5534

ms

error rate

field_value_function_score

0

%

Min Throughput

field_value_script_score

1.5

ops/s

Mean Throughput

field_value_script_score

1.5

ops/s

Median Throughput

field_value_script_score

1.5

ops/s

Max Throughput

field_value_script_score

1.5

ops/s

50th percentile latency

field_value_script_score

252.0535

ms

90th percentile latency

field_value_script_score

256.0499

ms

99th percentile latency

field_value_script_score

283.4663

ms

100th percentile latency

field_value_script_score

286.5056

ms

50th percentile service time

field_value_script_score

250.1281

ms

90th percentile service time

field_value_script_score

253.7611

ms

99th percentile service time

field_value_script_score

281.6927

ms

100th percentile service time

field_value_script_score

285.171

ms

error rate

field_value_script_score

0

%

Min Throughput

large_terms

1.1

ops/s

Mean Throughput

large_terms

1.1

ops/s

Median Throughput

large_terms

1.1

ops/s

Max Throughput

large_terms

1.1

ops/s

50th percentile latency

large_terms

823.0126

ms

90th percentile latency

large_terms

830.6155

ms

99th percentile latency

large_terms

835.3904

ms

100th percentile latency

large_terms

836.8937

ms

50th percentile service time

large_terms

815.8893

ms

90th percentile service time

large_terms

822.6666

ms

99th percentile service time

large_terms

828.4472

ms

100th percentile service time

large_terms

829.8767

ms

error rate

large_terms

0

%

Min Throughput

large_filtered_terms

1.1

ops/s

Mean Throughput

large_filtered_terms

1.1

ops/s

Median Throughput

large_filtered_terms

1.1

ops/s

Max Throughput

large_filtered_terms

1.1

ops/s

50th percentile latency

large_filtered_terms

827.2812

ms

90th percentile latency

large_filtered_terms

832.8975

ms

99th percentile latency

large_filtered_terms

839.1567

ms

100th percentile latency

large_filtered_terms

841.8363

ms

50th percentile service time

large_filtered_terms

820.1701

ms

90th percentile service time

large_filtered_terms

825.8126

ms

99th percentile service time

large_filtered_terms

832.2652

ms

100th percentile service time

large_filtered_terms

834.1797

ms

error rate

large_filtered_terms

0

%

Min Throughput

large_prohibited_terms

1.1

ops/s

Mean Throughput

large_prohibited_terms

1.1

ops/s

Median Throughput

large_prohibited_terms

1.1

ops/s

Max Throughput

large_prohibited_terms

1.1

ops/s

50th percentile latency

large_prohibited_terms

815.1735

ms

90th percentile latency

large_prohibited_terms

819.983

ms

99th percentile latency

large_prohibited_terms

825.432

ms

100th percentile latency

large_prohibited_terms

827.4997

ms

50th percentile service time

large_prohibited_terms

808.3108

ms

90th percentile service time

large_prohibited_terms

812.943

ms

99th percentile service time

large_prohibited_terms

818.3828

ms

100th percentile service time

large_prohibited_terms

820.7233

ms

error rate

large_prohibited_terms

0

%

Min Throughput

desc_sort_population

1.5

ops/s

Mean Throughput

desc_sort_population

1.51

ops/s

Median Throughput

desc_sort_population

1.5

ops/s

Max Throughput

desc_sort_population

1.51

ops/s

50th percentile latency

desc_sort_population

83.85828

ms

90th percentile latency

desc_sort_population

84.97299

ms

99th percentile latency

desc_sort_population

88.00757

ms

100th percentile latency

desc_sort_population

88.1269

ms

50th percentile service time

desc_sort_population

82.54314

ms

90th percentile service time

desc_sort_population

83.48184

ms

99th percentile service time

desc_sort_population

86.93492

ms

100th percentile service time

desc_sort_population

87.31277

ms

error rate

desc_sort_population

0

%

Min Throughput

asc_sort_population

1.5

ops/s

Mean Throughput

asc_sort_population

1.51

ops/s

Median Throughput

asc_sort_population

1.51

ops/s

Max Throughput

asc_sort_population

1.51

ops/s

50th percentile latency

asc_sort_population

85.66127

ms

90th percentile latency

asc_sort_population

86.65042

ms

99th percentile latency

asc_sort_population

90.16076

ms

100th percentile latency

asc_sort_population

95.53816

ms

50th percentile service time

asc_sort_population

84.45677

ms

90th percentile service time

asc_sort_population

85.10276

ms

99th percentile service time

asc_sort_population

88.91906

ms

100th percentile service time

asc_sort_population

93.78454

ms

error rate

asc_sort_population

0

%

Min Throughput

asc_sort_with_after_population

1.5

ops/s

Mean Throughput

asc_sort_with_after_population

1.5

ops/s

Median Throughput

asc_sort_with_after_population

1.5

ops/s

Max Throughput

asc_sort_with_after_population

1.5

ops/s

50th percentile latency

asc_sort_with_after_population

133.8266

ms

90th percentile latency

asc_sort_with_after_population

135.2227

ms

99th percentile latency

asc_sort_with_after_population

136.1994

ms

100th percentile latency

asc_sort_with_after_population

136.5529

ms

50th percentile service time

asc_sort_with_after_population

132.9712

ms

90th percentile service time

asc_sort_with_after_population

134.1224

ms

99th percentile service time

asc_sort_with_after_population

135.5426

ms

100th percentile service time

asc_sort_with_after_population

135.6862

ms

error rate

asc_sort_with_after_population

0

%

Min Throughput

desc_sort_geonameid

6

ops/s

Mean Throughput

desc_sort_geonameid

6.01

ops/s

Median Throughput

desc_sort_geonameid

6.01

ops/s

Max Throughput

desc_sort_geonameid

6.01

ops/s

50th percentile latency

desc_sort_geonameid

5.409995

ms

90th percentile latency

desc_sort_geonameid

5.93607

ms

99th percentile latency

desc_sort_geonameid

6.293108

ms

100th percentile latency

desc_sort_geonameid

6.577119

ms

50th percentile service time

desc_sort_geonameid

4.529794

ms

90th percentile service time

desc_sort_geonameid

4.89106

ms

99th percentile service time

desc_sort_geonameid

5.122794

ms

100th percentile service time

desc_sort_geonameid

5.300584

ms

error rate

desc_sort_geonameid

0

%

Min Throughput

desc_sort_with_after_geonameid

6.01

ops/s

Mean Throughput

desc_sort_with_after_geonameid

6.01

ops/s

Median Throughput

desc_sort_with_after_geonameid

6.01

ops/s

Max Throughput

desc_sort_with_after_geonameid

6.01

ops/s

50th percentile latency

desc_sort_with_after_geonameid

122.7259

ms

90th percentile latency

desc_sort_with_after_geonameid

126.947

ms

99th percentile latency

desc_sort_with_after_geonameid

130.2797

ms

100th percentile latency

desc_sort_with_after_geonameid

133.6721

ms

50th percentile service time

desc_sort_with_after_geonameid

121.94

ms

90th percentile service time

desc_sort_with_after_geonameid

126.1451

ms

99th percentile service time

desc_sort_with_after_geonameid

130.0349

ms

100th percentile service time

desc_sort_with_after_geonameid

133.0953

ms

error rate

desc_sort_with_after_geonameid

0

%

Min Throughput

asc_sort_geonameid

6.02

ops/s

Mean Throughput

asc_sort_geonameid

6.02

ops/s

Median Throughput

asc_sort_geonameid

6.02

ops/s

Max Throughput

asc_sort_geonameid

6.03

ops/s

50th percentile latency

asc_sort_geonameid

5.472744

ms

90th percentile latency

asc_sort_geonameid

6.21314

ms

99th percentile latency

asc_sort_geonameid

6.887052

ms

100th percentile latency

asc_sort_geonameid

7.244275

ms

50th percentile service time

asc_sort_geonameid

4.652755

ms

90th percentile service time

asc_sort_geonameid

5.154251

ms

99th percentile service time

asc_sort_geonameid

5.818483

ms

100th percentile service time

asc_sort_geonameid

5.886902

ms

error rate

asc_sort_geonameid

0

%

Min Throughput

asc_sort_with_after_geonameid

6.01

ops/s

Mean Throughput

asc_sort_with_after_geonameid

6.01

ops/s

Median Throughput

asc_sort_with_after_geonameid

6.01

ops/s

Max Throughput

asc_sort_with_after_geonameid

6.01

ops/s

50th percentile latency

asc_sort_with_after_geonameid

108.2379

ms

90th percentile latency

asc_sort_with_after_geonameid

110.4991

ms

99th percentile latency

asc_sort_with_after_geonameid

113.719

ms

100th percentile latency

asc_sort_with_after_geonameid

117.4331

ms

50th percentile service time

asc_sort_with_after_geonameid

107.0072

ms

90th percentile service time

asc_sort_with_after_geonameid

109.196

ms

99th percentile service time

asc_sort_with_after_geonameid

111.7552

ms

100th percentile service time

asc_sort_with_after_geonameid

115.6274

ms

error rate

asc_sort_with_after_geonameid

0

%

正文完