Trying to create an SLI in GCP that uses logs-based metrics, and failing

tl;dr - In GCP is it possible to create an SLO which uses a request-based SLI built from custom log metrics? I have two custom log metrics: one that counts the number of successful requests and another that counts the total requests. Put the successes as the numerator and the total as the denominator and we have an availability ratio. I haven't been able to make this work in GCP.

Here are the two logging metrics:

$ gcloud logging metrics describe api.requests.success
createTime: '2024-03-05T23:53:41.655463779Z'
description: Count of successful api requests
filter: |-
  jsonPayload.res.statusCode<"500"
  resource.type="k8s_container"
  resource.labels.container_name="web"
  -jsonPayload.req.url="/api/healthcheck"
metricDescriptor:
  description: Count of successful api requests
  metricKind: DELTA
  name: projects/PROJECT_ID/metricDescriptors/logging.googleapis.com/user/api.requests.success
  type: logging.googleapis.com/user/api.requests.success
  unit: '1'
  valueType: INT64
name: api.requests.success
updateTime: '2024-03-06T00:54:11.010808453Z'

$ gcloud logging metrics describe api.requests.total
createTime: '2024-03-05T23:54:55.495377712Z'
description: Total requests to the api
filter: |-
  resource.type="k8s_container"
  resource.labels.container_name="web"
  -jsonPayload.req.url="/api/healthcheck"
metricDescriptor:
  description: Total requests to the api
  metricKind: DELTA
  name: projects/PROJECT_ID/metricDescriptors/logging.googleapis.com/user/api.requests.total
  type: logging.googleapis.com/user/api.requests.total
  unit: '1'
  valueType: INT64
name: api.requests.total
updateTime: '2024-03-06T00:54:26.469904996Z'

In metrics explorer I can create a ratio of these two logging metrics, and I can save a chart with this data and put it on a dashboard.

metrics-expl-avail.png

Then I try to create an SLO using these two metrics. In the SLO Overview console page, I define a service. Instead of choosing from the Service Candidates tab, I hit the Custom Service tab. From there I name it 'API Availability'. Then I hit the 'create SLO' button.

Next, I try to set the SLI. That page says "Default availability and latency metrics are not available for Custom services. You can configure your own custom availability or latency SLI using the 'other metric' option." Going with 'Other', and choosing 'Request-based', I hit continue.

On the Define SLI details page, I'm hoping to be able to create a ratio of success/total similar to how I did in the Metrics Explorer page. Here I choose a Performance Metric. In the search field I choose my api.requests.success metric. And this is what I see. I can't also choose the api.requests.total metric to create the ratio. How do I do this? Is it possible?

sli-no-ratio.png

I search around the internet for answers. It's difficult to find any. I found a post by Yuri Grinshteyn that looks like it may hold the answer. https://medium.com/google-cloud/slos-with-stackdriver-service-monitoring-62f193147b3f

Using the API here, https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors/create, I created a custom metric. I fired it off via curl, and got back a 200:

{
  "name": "projects/PROJECT_ID/metricDescriptors/custom.googleapis.com/user/metrics/api.ratio.metric",
  "metricKind": "GAUGE",
  "valueType": "DOUBLE",
  "unit": "1",
  "description": "The proportion of successful API requests.",
  "displayName": "API Availability",
  "type": "custom.googleapis.com/user/metrics/api.ratio.metric"
}

Continuing to follow Yuri's example, I try to create the SLO via the API. This uses my custom metric and the two log based metrics.

{
  "name": "projects/PROJECT_ID/metricDescriptors/custom.googleapis.com/user/metrics/api.ratio.metric",
  "serviceLevelIndicator": {
    "requestBased": {
      "goodTotalRatio": {
        "goodServiceFilter": "metric.type=\"logging.googleapis.com/user/api.requests.success\" resource.type=\"k8s_pod\" resource.label.\"module_id\"=\"default\" metric.label.\"response_code\"=\"< 500\"",
        "totalServiceFilter": "metric.type=\"logging.googleapis.com/user/api.requests.total\" resource.type=\"k8s_pod\" resource.label.\"module_id\"=\"default\""
      }
    }
  },
  "goal": 0.98,
  "rollingPeriod": "2419200s",
  "displayName": "98% Successful requests in a rolling 28 day period"
}

A problem is, I don't know what to set for resource.type or resource.label. I've tried various combos and get va rious errors. I haven't found the documentation that shows what these to parameters are about.

Here on the API page, I don't know what to set for the 'parent'. I chose to try:

projects/PROJECT_ID/services/serviceLevelObjectiveId

Where serviceLevelObjectiveId is the ID of the custom service I created above in the SLO workflow, "API Availability".

The error I get back from this attempt is

{
  "error": {
    "code": 400,
    "message": "Invalid SLO definition: provided filter string \"metric.type=\"logging.googleapis.com/user/api.r
equests.success\" resource.type=\"k8s_pod\" resource.label.\"module_id\"=\"default\" metric.label.\"response_cod
e\"=\"\u003c 500\"\" parses to 0 resource types and must parse to 1.",
    "status": "INVALID_ARGUMENT"
  }
}

So ... am I on a right track here? Is it a matter of getting the right incantation? Setting resource.type and resource.label correctly?

Or is this a dead end?

Is it possible to create an SLO which uses a request-based SLI built from custom log metrics?

Thanks in advance!

2 1 213
1 REPLY 1

Hello @kallen ,

The error message you were encountering may be an indication of an issue with the SLO definition you were setting up. The error states that the filter string provided for the SLO is not correctly formatted.

  1. The filter string provided is attempting to filter based on a specific metric type and resource type
  2. The issue arises from the way the filter string is constructed, particularly with the resource labe "module_id" and metric label "response code"
  3. The error message specifies that the filter string should parse to 1 resource type, however it is not parsing correctly.

To resolve the error and successfully define the SLO, you may try to adjust the filter string so that it correctly parses to 1 resource type.

 

Top Labels in this Space
Top Solution Authors