Upload
pierre-mavro
View
212
Download
0
Embed Size (px)
Citation preview
Couchbase usage and perfsCriteo - Couchbase Live 2016 - Paris
About me
Pierre Mavro - Lead DevOps - NoSQL Team
Working at Criteo as Site Reliability Engineer
@deimosfr
Criteo
31 Offices
2000+ employees
Criteo technical insights
● 700 engineers● 17K servers● 27K displays per second● 2.4M requests per second
Criteo SRE: biggest challenges
● Scaling● Low latency● High throughput● Resiliency● Automation
Couchbase figures at Criteo (Worldwide)
● 1300+ physical servers● 100+ clusters (up to 50 servers each)● 90TB of data in memory● 25M QPS● < 8ms constant latency
Couchbase usage at Criteo
● Storing UUIDs < 30b● Storing blobs (ex. binary images)● Storing keys size > value data size (sometimes)● Serving between 100Kqps to 2.5Mqps per cluster● Low latency at 99perc < 2ms● Data size per cluster between 500Gb to ~12Tb (with replica)● All data fits in memory● Inter datacenter replication (custom client driver)
What we wanted to solve
Legacy infrastructure
● Couchbase v1.8 legacy (80%) and v3.0.1 community (20%)
● Slow rebalance (up to 48h for 1 server)● Rebalance failures on high loaded
clusters● Max connection reached on v1.8 (9k)
Legacy infrastructure
● Same cluster shares on persisted and non persisted buckets
● No dedicated latency monitoring tool● No auto restart/upgrade orchestrator● Server benchmarks update required● Lack of Couchbase best practices
How we achieved the change
1. Benchmarks
Benchmarks
● Couchbase Enterprise 3.1.3● 3x HP GEN9 DL360 (256GB RAM, 6x400GB SSD RAID10, 1Gb Network
interface) (2x injectors + 1 server)● Key size: UUID string (36 bytes) + Couchbase metadata (56 bytes)● Value size: uniform range between 750 B and 1250 B (avg 1 kB)● Number of items: 50M/node (with replica) or 100M/node (without replica)● Resident active items (= items fully in RAM): ~50%● Value-only ejection mode (only data value can be removed from RAM, keeping
metadata + key in RAM).
Benchmarks
Heavy Writes/Little Reads (10Kqps) without replica
Write rate per node
Status Disk Write Queue
Latency 50 perc
Latency 95 perc
Latency 99perc
Latency 99,9 perc
40 Kset/s OK 10M items 0.4 ms 0.7 ms 2 ms 8 ms
60 Kset/s OK 30M items 0.4 ms 0.7 ms 2 ms 20 ms
80 Kset/s OK 50M items 0.4 ms 2 ms 7 ms 30 ms
100 Kset/s OK 70M items 1.5 ms 5 ms 10 ms 40 ms
Benchmarks
Heavy Writes/Little Reads (10Kqps) with one replica
Write rate per node
Status Disk Write Queue
Latency 50 perc
Latency 95 perc
Latency 99perc
Latency 99,9 perc
20 Kset/s OK 12M items 0.4 ms 1 ms 2 ms 10 ms
30 Kset/s OK 33M items 0.5 ms 2 ms 4 ms 20 ms
40 Kset/s OK 60M items 0.6 ms 2 ms 5 ms 25 ms
50 Kset/s NOK (OOM) >70M items 0.7 ms 5 ms 50 ms 75 ms
Benchmarks
Heavy Reads/Little Writes (10Kqps) with one replica
Read rate per node
Status Disk Write Queue
Latency 50 perc
Latency 95 perc
Latency 99perc
Latency 99,9 perc
25 Kset/s OK 130k items 0.4 ms 0.7 ms 4 ms 8 ms
50 Kset/s OK 130k items 0.4 ms 1 ms 5 ms 10 ms
75 Kset/s OK 130k items 0.4 ms 5 ms 15 ms 25 ms
100 Kset/s NOK 50k to 500k items
16 ms 25 ms 45 ms 100 ms
Benchmarks
Conclusion for a single node:
● Network 1Gb is the bottleneck● Replicas introduce latency● Reads are fast● Max write with replica per node: 40 Kqps● Max read with replica per node: 90Kqps● Max read/write without replica per node: 90 Kqps
2. SLI, SLO & SLA
Metrics
Metrics are greats !
● QPS total (read + write)● Total RAM usage● Availability● Number of items● …
But it’s not relevant enough to know the global service status !
SLI: add the major missing metric
Adding latency monitoring as SLI, to be part of our Couchbase SLO and SLA
3. Couchbase support
Support contract
● Get latest Couchbase bug fixes● Suggest Couchbase enhancements● Speed up resolution of incidents with the help of support● Get better Couchbase tuning recommendations for performance
4. Refactoring infrastructure
Split usages
● High-load (QPS) buckets are on dedicated clusters
● Low-load (QPS) buckets are shared on separate “shared” clusters
● Persisted and Non persisted clusters are not on the same servers anymore
5. Administration Automation
Automation: why?
● Need to upgrade from the community to the enterprise version● Need to apply new configuration options that require a restart of all the
nodes in a cluster● Need to apply fixes that require a reboot of all the nodes in a cluster● Need to reinstall servers from scratch
Automation: how?
● Criteo is using Chef to bootstrap servers, deploy applications and configuration
● We did not want to add another new tool in the loop● Nothing with the required features already exists● We developed a FOSS Chef cookbook for this and other use-cases:
Choregraphie
https://github.com/criteo-cookbooks/choregraphie
Automation: Choregraphie
With Choregraphie we can perform:
● Rolling restart with rebalance● Rolling upgrade with rebalance● Use an optional, additional server to speed up rebalance● Rolling reboot with rebalance● Rolling reinstall with rebalance
Choregraphie is open source! Feel free to contribute
6. Couchbase and system tuning
Couchbase best practices / system tuning
● Minimize swap usage:○ vm.swappiness = 0 (set to 1 for kernel >3.5)
● Disable transparent Hugepages:○ chkconfig disable-thp on
● Set SSD IO-Scheduler to deadline: ○ echo “deadline” > /sys/block/sdX/queue/scheduler
● Change CPUFreq governor:○ modprobe cpufreq_performance
● Leverage maximum connection:○ max_conns_on_port_XXXX: 30000
Couchbase tuning
● Upgrade Nonio parameter to 8 to avoid rebalance failures on high-load clusters:
○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "max_num_nonio=<N>"}]).' http://<NodeIP>:8091/diag/eval
● Disable access log if you don’t need them to reduce disk usage (native in Couchbase 4.5):
○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "access_scanner_enabled=false"}]).' http://<NodeIP>:8091/diag/eval
Tuning...what’s next?
● Network teaming 802.3ad (bonding) with 2x1Gb cards● 10Gb network cards● Upgrade to Couchbase 4.5● Upgrade kernel to a newer LTS vanilla to enable specific SSD enhancement
(multi queues SSD)● Switch to Mesos to reduce administration time
Questions ?Criteo - Couchbase Live 2016 - Paris
Pierre Mavro / @deimosfr