Couchbase live 2016

Couchbase usage and perfsCriteo - Couchbase Live 2016 - Paris

About me

Pierre Mavro - Lead DevOps - NoSQL Team

Working at Criteo as Site Reliability Engineer

@deimosfr

Criteo

31 Offices

2000+ employees

Criteo technical insights

● 700 engineers● 17K servers● 27K displays per second● 2.4M requests per second

Criteo SRE: biggest challenges

● Scaling● Low latency● High throughput● Resiliency● Automation

Couchbase figures at Criteo (Worldwide)

● 1300+ physical servers● 100+ clusters (up to 50 servers each)● 90TB of data in memory● 25M QPS● < 8ms constant latency

Couchbase usage at Criteo

● Storing UUIDs < 30b● Storing blobs (ex. binary images)● Storing keys size > value data size (sometimes)● Serving between 100Kqps to 2.5Mqps per cluster● Low latency at 99perc < 2ms● Data size per cluster between 500Gb to ~12Tb (with replica)● All data fits in memory● Inter datacenter replication (custom client driver)

What we wanted to solve

Legacy infrastructure

● Couchbase v1.8 legacy (80%) and v3.0.1 community (20%)

● Slow rebalance (up to 48h for 1 server)● Rebalance failures on high loaded

clusters● Max connection reached on v1.8 (9k)

Legacy infrastructure

● Same cluster shares on persisted and non persisted buckets

● No dedicated latency monitoring tool● No auto restart/upgrade orchestrator● Server benchmarks update required● Lack of Couchbase best practices

How we achieved the change

1. Benchmarks

Benchmarks

● Couchbase Enterprise 3.1.3● 3x HP GEN9 DL360 (256GB RAM, 6x400GB SSD RAID10, 1Gb Network

interface) (2x injectors + 1 server)● Key size: UUID string (36 bytes) + Couchbase metadata (56 bytes)● Value size: uniform range between 750 B and 1250 B (avg 1 kB)● Number of items: 50M/node (with replica) or 100M/node (without replica)● Resident active items (= items fully in RAM): ~50%● Value-only ejection mode (only data value can be removed from RAM, keeping

metadata + key in RAM).

Benchmarks

Heavy Writes/Little Reads (10Kqps) without replica

Write rate per node

Status Disk Write Queue

Latency 50 perc

Latency 95 perc

Latency 99perc

Latency 99,9 perc

40 Kset/s OK 10M items 0.4 ms 0.7 ms 2 ms 8 ms

60 Kset/s OK 30M items 0.4 ms 0.7 ms 2 ms 20 ms

80 Kset/s OK 50M items 0.4 ms 2 ms 7 ms 30 ms


Benchmarks

Heavy Writes/Little Reads (10Kqps) with one replica

Write rate per node


Latency 50 perc

Latency 95 perc

Latency 99perc

Latency 99,9 perc




50 Kset/s NOK (OOM) >70M items 0.7 ms 5 ms 50 ms 75 ms

Benchmarks

Heavy Reads/Little Writes (10Kqps) with one replica

Read rate per node


Latency 50 perc

Latency 95 perc

Latency 99perc

Latency 99,9 perc

25 Kset/s OK 130k items 0.4 ms 0.7 ms 4 ms 8 ms

50 Kset/s OK 130k items 0.4 ms 1 ms 5 ms 10 ms

75 Kset/s OK 130k items 0.4 ms 5 ms 15 ms 25 ms

100 Kset/s NOK 50k to 500k items

16 ms 25 ms 45 ms 100 ms

Benchmarks

Conclusion for a single node:

● Network 1Gb is the bottleneck● Replicas introduce latency● Reads are fast● Max write with replica per node: 40 Kqps● Max read with replica per node: 90Kqps● Max read/write without replica per node: 90 Kqps

2. SLI, SLO & SLA

Metrics

Metrics are greats !

● QPS total (read + write)● Total RAM usage● Availability● Number of items● …

But it’s not relevant enough to know the global service status !

SLI: add the major missing metric

Adding latency monitoring as SLI, to be part of our Couchbase SLO and SLA

3. Couchbase support

Support contract

● Get latest Couchbase bug fixes● Suggest Couchbase enhancements● Speed up resolution of incidents with the help of support● Get better Couchbase tuning recommendations for performance

4. Refactoring infrastructure

Split usages

● High-load (QPS) buckets are on dedicated clusters

● Low-load (QPS) buckets are shared on separate “shared” clusters

● Persisted and Non persisted clusters are not on the same servers anymore

5. Administration Automation

Automation: why?

● Need to upgrade from the community to the enterprise version● Need to apply new configuration options that require a restart of all the

nodes in a cluster● Need to apply fixes that require a reboot of all the nodes in a cluster● Need to reinstall servers from scratch

Automation: how?

● Criteo is using Chef to bootstrap servers, deploy applications and configuration

● We did not want to add another new tool in the loop● Nothing with the required features already exists● We developed a FOSS Chef cookbook for this and other use-cases:

Choregraphie

https://github.com/criteo-cookbooks/choregraphie



Automation: Choregraphie

With Choregraphie we can perform:

● Rolling restart with rebalance● Rolling upgrade with rebalance● Use an optional, additional server to speed up rebalance● Rolling reboot with rebalance● Rolling reinstall with rebalance

Choregraphie is open source! Feel free to contribute

6. Couchbase and system tuning

Couchbase best practices / system tuning

● Minimize swap usage:○ vm.swappiness = 0 (set to 1 for kernel >3.5)

● Disable transparent Hugepages:○ chkconfig disable-thp on

● Set SSD IO-Scheduler to deadline: ○ echo “deadline” > /sys/block/sdX/queue/scheduler

● Change CPUFreq governor:○ modprobe cpufreq_performance

● Leverage maximum connection:○ max_conns_on_port_XXXX: 30000

Couchbase tuning

● Upgrade Nonio parameter to 8 to avoid rebalance failures on high-load clusters:

○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "max_num_nonio=<N>"}]).' http://<NodeIP>:8091/diag/eval

● Disable access log if you don’t need them to reduce disk usage (native in Couchbase 4.5):

○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "access_scanner_enabled=false"}]).' http://<NodeIP>:8091/diag/eval

Tuning...what’s next?

● Network teaming 802.3ad (bonding) with 2x1Gb cards● 10Gb network cards● Upgrade to Couchbase 4.5● Upgrade kernel to a newer LTS vanilla to enable specific SSD enhancement

(multi queues SSD)● Switch to Mesos to reduce administration time

Questions ?Criteo - Couchbase Live 2016 - Paris

Pierre Mavro / @deimosfr

Software

Couchbase live 2016