Expand WITH expressions

Tutorial for WITH expressions in MetricsQL

Let's look at the following real query from Node Exporter Full dashboard:

(
  (
    node_memory_MemTotal_bytes{instance=~"$node:$port", job=~"$job"}
      -
    node_memory_MemFree_bytes{instance=~"$node:$port", job=~"$job"}
  )
    /
  node_memory_MemTotal_bytes{instance=~"$node:$port", job=~"$job"}
)
  *
100

It is clear the query calculates the percentage of used memory for the given $node, $port and $job. Isn't it? :)

What's wrong with this query? Copy-pasted label filters for distinct timeseries which makes it easy to mistype these filters during modification. Let's simplify the query with WITH expressions:

WITH (
    commonFilters = {instance=~"$node:$port",job=~"$job"}
)
(
  node_memory_MemTotal_bytes{commonFilters}
    -
  node_memory_MemFree_bytes{commonFilters}
)
  /
node_memory_MemTotal_bytes{commonFilters} * 100

Now label filters are located in a single place instead of three distinct places. The query mentions node_memory_MemTotal_bytes metric twice and {commonFilters} three times. WITH expressions may improve this:

WITH (
    my_resource_utilization(free, limit, filters) = (limit{filters} - free{filters}) / limit{filters} * 100
)
my_resource_utilization(
  node_memory_MemFree_bytes,
  node_memory_MemTotal_bytes,
  {instance=~"$node:$port",job=~"$job"},
)

Now the template function my_resource_utilization() may be used for monitoring arbitrary resources - memory, CPU, network, storage, you name it.

Let's take another nice query from Node Exporter Full dashboard:

(
  (
    (
      count(
        count(node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"}) by (cpu)
      )
    )
      -
    avg(
      sum by (mode) (rate(node_cpu_seconds_total{mode='idle',instance=~"$node:$port",job=~"$job"}[5m]))
    )
  )
    *
  100
)
  /
count(
  count(node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"}) by (cpu)
)

Do you understand what does this mess do? Is it manageable? :) WITH expressions are happy to help in a few iterations.

1. Extract common filters used in multiple places into a commonFilters variable:

WITH (
    commonFilters = {instance=~"$node:$port",job=~"$job"}
)
(
  (
    (
      count(
        count(node_cpu_seconds_total{commonFilters}) by (cpu)
      )
    )
      -
    avg(
      sum by (mode) (rate(node_cpu_seconds_total{mode='idle',commonFilters}[5m]))
    )
  )
    *
  100
)
  /
count(
  count(node_cpu_seconds_total{commonFilters}) by (cpu)
)

2. Extract "count(count(...) by (cpu))" into cpuCount variable:

WITH (
    commonFilters = {instance=~"$node:$port",job=~"$job"},
    cpuCount = count(count(node_cpu_seconds_total{commonFilters}) by (cpu))
)
(
  (
    cpuCount
      -
    avg(
      sum by (mode) (rate(node_cpu_seconds_total{mode='idle',commonFilters}[5m]))
    )
  )
    *
  100
) / cpuCount

3. Extract rate(...) part into cpuIdle variable, since it is clear now that this part calculates the number of idle CPUs:

WITH (
    commonFilters = {instance=~"$node:$port",job=~"$job"},
    cpuCount = count(count(node_cpu_seconds_total{commonFilters}) by (cpu)),
    cpuIdle = sum(rate(node_cpu_seconds_total{mode='idle',commonFilters}[5m]))
)
((cpuCount - cpuIdle) * 100) / cpuCount

4. Put node_cpu_seconds_total{commonFilters} into its own variable with the name cpuSeconds:

WITH (
    cpuSeconds = node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"},
    cpuCount = count(count(cpuSeconds) by (cpu)),
    cpuIdle = sum(rate(cpuSeconds{mode='idle'}[5m]))
)
((cpuCount - cpuIdle) * 100) / cpuCount

Now the query became more clear comparing to the initial query.

WITH expressions may be nested and may be put anywhere. Try expanding the following query:

WITH (
    f(a, b) = WITH (
        f1(x) = b-x,
        f2(x) = x+x
    ) f1(a)*f2(b)
) f(foo, with(x=bar) x)