Enhancing Effectivity Of Goku Time Collection Database at Pinterest (Half 2) | by Pinterest Engineering | Pinterest Engineering Weblog

Enhancing Effectivity Of Goku Time Collection Database at Pinterest (Half 2) | by Pinterest Engineering | Pinterest Engineering Weblog
Enhancing Effectivity Of Goku Time Collection Database at Pinterest (Half 2) | by Pinterest Engineering | Pinterest Engineering Weblog
Pinterest Engineering
Pinterest Engineering Blog

11 min learn

Mar 12, 2024

Monil Mukesh Sanghavi | Software program Engineer, Actual Time Analytics Workforce; Xiao Li | Software program Engineer, Actual Time Analytics Workforce; Ming-Could Hu | Software program Engineer, Actual Time Analytics Workforce; Zhenxiao Luo | Software program Engineer, Actual Time Analytics Workforce; Kapil Bajaj | Supervisor, Actual Time Analytics Workforce

man’s hands holding a stopwatch

At Pinterest, one of many pillars of the observability stack gives inner engineering groups (our customers) the chance to watch their providers utilizing metrics information and arrange alerting on it. Goku is our in-house time collection database offering price environment friendly and low latency storage for metrics information. Beneath, Goku shouldn’t be a single cluster however a group of sub-service parts together with:

  • Goku Brief Time period (in-memory storage for the final 24 hours of knowledge, known as GokuS)
  • Goku Lengthy Time period (ssd and hdd primarily based storage for older information, known as GokuL)
  • Goku Compactor (time collection information aggregation and conversion engine)
  • Goku Root (good question routing)

You may learn extra about these parts within the weblog posts on GokuS Storage, GokuL (long run) storage, and Value Financial savings on Goku, however quite a bit has modified in Goku since these have been written. Now we have applied a number of options that elevated the effectivity of Goku and improved the person expertise. On this 3 half weblog submit collection, we are going to cowl the effectivity enhancements in 3 main facets:

  1. Enhancing restoration time of each GokuS and GokuL (that is the overall time a single host or cluster in Goku takes to come back up and begin serving time collection queries)
  2. Enhancing question expertise in Goku by decreasing latencies of pricey and excessive cardinality queries
  3. Lowering the general price of Goku at Pinterest

We’ll additionally share some learnings and takeaways from utilizing Goku for storing metrics at Pinterest.

This 2nd weblog submit focuses on how Goku time collection queries have been improved. We are going to present a quick overview of Goku’s time collection information mannequin, question mannequin, and structure. We are going to observe up with the advance options we added together with rollup, pre-aggregation, and pagination.

The info mannequin of a time collection in Goku is similar to OpenTSDB’s (which Goku changed) information mannequin. You will discover extra particulars here. Right here’s a fast overview of the Goku TimeSeries information mannequin.

A time collection metadata or key consists of the next:

Metric Name: proc.stat.cpu; Tag Value Combination 1: host=abc; Tag Value Combination 2: cluster=goku; Tag Value Combination 3: az=us-east-1a;  Tag Value Combination n: os=ubuntu-1

The info a part of a time collection, which we discuss with as time collection stream, consists of information factors which can be time worth pairs, the place time is in unix time and worth is a numerical worth.

Data point 1 — Timestamp: 16:00, Value: 3.0; Data point 2 — Timestamp: 16:01, Value: 4.2; Data point 3 — Timestamp: 16.02, Value: 5.2; Data point n — Timestamp: 16.59, Value: 4.0

A number of hosts can emit time collection for a singular metric identify. For instance: cpu,reminiscence,disk utilization or some utility metric. The host-specific info is a part of one of many tags talked about above. For instance: tag- key == host and worth == host identify.

Multicolor chart displaying TimeSeries number, Metric Name, Tag Value 1, Tag Value 2, Tag Value 3, Tag Value n

A cardinality of a metric (i.e. metric identify) is outlined as the overall variety of distinctive timeseries for that metric identify. A singular time collection has a singular mixture of tag keys and values. You may perceive extra about cardinality here.

For instance, the cardinality of the metric identify “proc.stat.cpu” within the above desk is 5, as a result of the mixture of tag worth pairs together with the metric identify of every of those 5 timeseries don’t repeat. Equally, the cardinality of the metric identify “proc.stat.mem” is 3. Observe how we signify a selected string (be it metric identify or tag worth) as a singular colour. That is to point out {that a} sure tag worth pair might be current in a number of time collection, however the mixture of such strings is what makes a time collection distinctive.

Goku makes use of apache thrift for Question RPC. The question mannequin of Goku is similar to OpenTSDB’s question mannequin specified here. To summarize, a question to Goku Root is much like the request specified beneath:

Let’s go over the essential choices within the request construction above:

  • metricName — metric identify with out the tag mixtures
  • checklist<Filter> — filters on tag values like sample match, wildcard, embrace/ exclude tag worth (might be a number of), and so forth.
  • Aggregator — sum/ max/ min/ p99/ depend/ imply/ median and so forth. on the group of timeseries
  • Downsample — person specified granularity in time returned in outcomes
  • Rollup aggregation/ interval — downsampling at a time collection degree. This feature turns into necessary in lengthy vary queries (you will note the rationale beneath in Rollup).
  • startTime, endTime — vary of question

The question response appears to be like as follows:

The monitoring and alerting framework at Pinterest (internally referred to as statsboard) question shopper sends QueryRequest to Goku Root, which forwards it to the leaf clusters (GokuS and/ or GokuL) primarily based on the question time vary and the shards they host. The leaf clusters do the required grouping (filtering), interpolation, aggregation, and downsampling as wanted and reply to the Goku Root with QueryResponse. The Root will once more do the aggregation if essential and reply to the statsboard question shopper with QueryResponse.

Let’s now have a look at how we improved the question expertise.

Goku helps the bottom time granularity of 1 second within the time collection stream. Nevertheless, having such nice granularity can influence the question efficiency as a result of following causes:

  • An excessive amount of information (too many information factors) over the community for a non downsample uncooked question
  • Costly computation and therefore cpu price whereas aggregating due to too many information factors
  • Time consuming information fetch, particularly for GokuL (which makes use of SSD, HDD for information storage)

For previous metric information residing in GokuL, we determined to additionally retailer rolled up information to spice up question latency. Rolling up means decreasing the granularity of the time collection information factors by storing aggregated values for the determined interval. For instance: A uncooked time collection stream

when aggregated utilizing rollup interval of 5 and rollup aggregators of sum, min, max, depend, common could have 5 shorter time collection streams as follows:

The next desk explains the tiering and rollup technique:

Rollup benefitted the GokuL service in 3 methods:

  • Diminished the storage price of considerable uncooked information
  • Decreased the info fetch price from ssd, lowered the cpu aggregation price, and thus lowered the question latency
  • Some queries that will day trip from the OpenTSDB supporting HBase clusters would return profitable question outcomes from GokuL.

The rollup aggregation is finished within the Goku compactor (defined right here) earlier than it creates the sst recordsdata containing the time collection information to be saved within the rocksDB primarily based GokuL situations.

In manufacturing, we observe that p99 latency of queries utilizing rolled up information is sort of 1000x lower than queries utilizing uncooked information.

P99 latency for GokuL question utilizing uncooked information is sort of a couple of seconds
GokuL question utilizing rollup information has p99 in milliseconds.

At question time, Goku responds with an exception stating “cardinality restrict exceeded” if the variety of time collection the question would choose/ learn from submit filtering exceeds the pre-configured restrict. That is to guard the Goku system sources on account of noisy costly queries. We noticed queries for top cardinality metrics hitting timeouts, chewing up the system sources, and affecting the in any other case low latency queries. Usually, after analyzing the excessive cardinality or timing out queries, we discovered that the tag(s) that contributed to the excessive cardinality of the metric weren’t even wanted by the person within the last question outcome.

The pre-aggregation characteristic was launched with the purpose of eradicating these undesirable tags within the pre-aggregated metrics, thus, decreasing the unique cardinality, decreasing the question latency, and efficiently serving the question outcomes to the person with out timing out or consuming lots of system sources. The characteristic creates and shops aggregated time collection by eradicating pointless tags that the person mentions. The aggregated time collection has tags that the person has particularly requested to protect. For instance:

If the person asks to allow pre-aggregation for the metric “app.some_stat” and needs to protect solely the cluster and az info, the pre-aggregated time collection will seem like this:

Observe how the cardinality of the pre-aggregated metric is lowered from 5 to three.

The pre-aggregated metrics are new time collection created inside Goku that don’t change the unique uncooked time collection. Additionally for the sake of simplicity, we determined to not introduce these metrics again into the everyday ingestion pipeline that we emit to Kafka.

Here’s a circulate of how enabling pre-aggregation works:

  1. Customers experiencing excessive latency queries or queries hitting cardinality restrict exceeded timeout resolve to allow pre-aggregation for the metric.
  2. The Goku staff gives the tag mixture distribution of the metric to the person. For instance:

3. Customers resolve on the tags they wish to protect within the pre-aggregated time collection. The “to be preserved” tags are referred to as grouping tags. There’s additionally an elective provision offered to pick out a selected tag key == tag worth mixture to be preserved and discard all different tag worth mixtures for that tag key. These provisions are known as conditional tags.

4. Person is notified of the lowered cardinality and pre-aggregation is enabled for the metric which the person finalizes.

Write path change:

After consuming a knowledge level for a metric from Kafka, the Goku Brief Time period host checks if the time collection qualifies to be pre-aggregated. If the time collection qualifies, the worth of the datapoint is entered in an in reminiscence information construction, which data the sum, max, min, depend, and imply of the info seen to date. The info construction additionally emits 5 aggregated information factors (aggregations talked about above) for the time collection with an internally modified Goku metric identify each minute.

Learn Path change:

Within the question request to Goku Root, the observability statsboard shopper sends a boolean, which determines if the pre-aggregated model of the metric must be queried. Goku Root does the corresponding metric identify change to question the proper time collection.

Success story: One manufacturing metric (within the instance offered above) saved in Goku on which alerts have been set was seeing excessive cardinality exceptions (cardinality ~32M throughout peak hours).

We reached out to the person to assist perceive the use case and urged enabling pre-aggregation for his or her metric. As soon as we enabled pre-aggregation, the queries efficiently accomplished with latencies beneath 100ms.

Now we have onboarded greater than 50 use circumstances for pre-aggregation.

Throughout launch to manufacturing, a question timeout characteristic needed to be applied in Goku Lengthy Time period to keep away from an costly question consuming the server sources for a very long time. This, nevertheless, resulted in customers of pricey queries seeing timeouts and wastage of server sources even when it was for a brief time frame (i.e. configured question timeout). To confront this problem, the pagination characteristic was launched, which might promise a non timed out outcome to the top person of an costly question, despite the fact that it could take longer than regular. It could additionally break/ plan the question in such a manner that useful resource utilization on the server is managed.

The workflow of the pagination characteristic is:

  1. Question shopper sends a PagedQueryRequest to Goku Root if the metric is within the checklist of pagination supported metrics.
  2. Goku Root plans the question primarily based on time slicing.
  3. Goku Root and Question shopper have a collection of request-response exchanges with the basis server. This gives the question shopper with a touch of what ought to be the subsequent begin and finish time vary of the question and its personal IP deal with in order that the site visitors managing envoy can route the question to the proper server.

Now we have included ~10 use circumstances in manufacturing.

The next are concepts now we have to additional enhance question expertise in Goku:

Tag-based aggregation in Goku

Throughout compaction, generate pre-aggregated time collection by aggregating on the excessive cardinality contributing tags like host, and so forth. Work with the shopper staff to determine such tags. This can generate time collection and enhance the storage price, however not by a lot. Within the queries, if the excessive cardinality tags will not be current, the leaf server will mechanically serve utilizing the pre-aggregated time collection.

Presently, the shopper observability staff already has a characteristic in place to take away the excessive cardinality contributing host tag from a set of long run metrics. Sooner or later, this may make use of the tag-based aggregation assist in Goku, or Goku can present the tips to the observability staff primarily based on the question evaluation above to incorporate extra long run metrics of their checklist.

Put up-query processing assist in Goku

Many customers of statsboard use the tscript submit question processing to additional course of their outcomes. The pushing of this processing layer into Goku can present the next advantages:

  1. Leverages further compute sources accessible at Goku Root and Goku Leaf (GokuS and GokuL) clusters
  2. Much less information over the community resulting in potential decrease question latencies

Some examples of submit question processing assist embrace discovering the highest N time collection, summing of the time collection, and so forth.

Backfilling assist in pre-aggregation

We at the moment don’t assist pre-aggregated queries for a metric for a time vary that falls earlier than the time the metric was configured for pre-aggregation. For instance: if a metric was enabled for pre-aggregation on 1st Jan 2022 00:00:00, customers received’t have the ability to question pre-aggregated information for time earlier than thirty first Dec 2021 23:59:59. By supporting pre-aggregation throughout compaction, we are able to take away this restrict and slowly however steadily (as bigger tier buckets begin forming), customers will begin seeing pre-aggregated information for older time ranges.

SQL assist

Presently, Goku is queryable solely by utilizing a thrift interface for RPC. SQL is extensively used worldwide as a querying framework for information, and having SQL assist in Goku would considerably assist analytical use circumstances. We’re beginning to see an rising demand for this and are exploring options.

Learn from S3

A capability to retailer and skim from S3 would assist Goku lengthen the ttl of uncooked information, and even lengthen the ttl of queryable metrics information. This might additionally show price helpful to retailer metrics which can be sometimes used.

Particular because of Rui Zhang, Hao Jiang, and Miao Wang for his or her efforts in supporting the above options. An enormous because of the Observability staff for his or her assist and assist for these options on the person going through aspect.

Within the subsequent weblog, we are going to deal with how we introduced down the price of the Goku service(s).

To be taught extra about engineering at Pinterest, try the remainder of our Engineering Weblog and go to our Pinterest Labs website. To discover and apply to open roles, go to our Careers web page.