tl;dr - In GCP is it possible to create an SLO which uses a request-based SLI built from custom log metrics? I have two custom log metrics: one that counts the number of successful requests and another that counts the total requests. Put the successes as the numerator and the total as the denominator and we have an availability ratio. I haven't been able to make this work in GCP.
Here are the two logging metrics:
$ gcloud logging metrics describe api.requests.success createTime: '2024-03-05T23:53:41.655463779Z' description: Count of successful api requests filter: |- jsonPayload.res.statusCode<"500" resource.type="k8s_container" resource.labels.container_name="web" -jsonPayload.req.url="/api/healthcheck" metricDescriptor: description: Count of successful api requests metricKind: DELTA name: projects/PROJECT_ID/metricDescriptors/logging.googleapis.com/user/api.requests.success type: logging.googleapis.com/user/api.requests.success unit: '1' valueType: INT64 name: api.requests.success updateTime: '2024-03-06T00:54:11.010808453Z' $ gcloud logging metrics describe api.requests.total createTime: '2024-03-05T23:54:55.495377712Z' description: Total requests to the api filter: |- resource.type="k8s_container" resource.labels.container_name="web" -jsonPayload.req.url="/api/healthcheck" metricDescriptor: description: Total requests to the api metricKind: DELTA name: projects/PROJECT_ID/metricDescriptors/logging.googleapis.com/user/api.requests.total type: logging.googleapis.com/user/api.requests.total unit: '1' valueType: INT64 name: api.requests.total updateTime: '2024-03-06T00:54:26.469904996Z'
In metrics explorer I can create a ratio of these two logging metrics, and I can save a chart with this data and put it on a dashboard.
Then I try to create an SLO using these two metrics. In the SLO Overview console page, I define a service. Instead of choosing from the Service Candidates tab, I hit the Custom Service tab. From there I name it 'API Availability'. Then I hit the 'create SLO' button.
Next, I try to set the SLI. That page says "Default availability and latency metrics are not available for Custom services. You can configure your own custom availability or latency SLI using the 'other metric' option." Going with 'Other', and choosing 'Request-based', I hit continue.
On the Define SLI details page, I'm hoping to be able to create a ratio of success/total similar to how I did in the Metrics Explorer page. Here I choose a Performance Metric. In the search field I choose my api.requests.success metric. And this is what I see. I can't also choose the api.requests.total metric to create the ratio. How do I do this? Is it possible?
I search around the internet for answers. It's difficult to find any. I found a post by Yuri Grinshteyn that looks like it may hold the answer. https://medium.com/google-cloud/slos-with-stackdriver-service-monitoring-62f193147b3f
Using the API here, https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors/create, I created a custom metric. I fired it off via curl, and got back a 200:
{ "name": "projects/PROJECT_ID/metricDescriptors/custom.googleapis.com/user/metrics/api.ratio.metric", "metricKind": "GAUGE", "valueType": "DOUBLE", "unit": "1", "description": "The proportion of successful API requests.", "displayName": "API Availability", "type": "custom.googleapis.com/user/metrics/api.ratio.metric" }
Continuing to follow Yuri's example, I try to create the SLO via the API. This uses my custom metric and the two log based metrics.
{ "name": "projects/PROJECT_ID/metricDescriptors/custom.googleapis.com/user/metrics/api.ratio.metric", "serviceLevelIndicator": { "requestBased": { "goodTotalRatio": { "goodServiceFilter": "metric.type=\"logging.googleapis.com/user/api.requests.success\" resource.type=\"k8s_pod\" resource.label.\"module_id\"=\"default\" metric.label.\"response_code\"=\"< 500\"", "totalServiceFilter": "metric.type=\"logging.googleapis.com/user/api.requests.total\" resource.type=\"k8s_pod\" resource.label.\"module_id\"=\"default\"" } } }, "goal": 0.98, "rollingPeriod": "2419200s", "displayName": "98% Successful requests in a rolling 28 day period" }
A problem is, I don't know what to set for resource.type or resource.label. I've tried various combos and get va rious errors. I haven't found the documentation that shows what these to parameters are about.
Here on the API page, I don't know what to set for the 'parent'. I chose to try:
projects/PROJECT_ID/services/serviceLevelObjectiveId
Where serviceLevelObjectiveId is the ID of the custom service I created above in the SLO workflow, "API Availability".
The error I get back from this attempt is
{ "error": { "code": 400, "message": "Invalid SLO definition: provided filter string \"metric.type=\"logging.googleapis.com/user/api.r equests.success\" resource.type=\"k8s_pod\" resource.label.\"module_id\"=\"default\" metric.label.\"response_cod e\"=\"\u003c 500\"\" parses to 0 resource types and must parse to 1.", "status": "INVALID_ARGUMENT" } }
So ... am I on a right track here? Is it a matter of getting the right incantation? Setting resource.type and resource.label correctly?
Or is this a dead end?
Is it possible to create an SLO which uses a request-based SLI built from custom log metrics?
Thanks in advance!
Hello @kallen ,
The error message you were encountering may be an indication of an issue with the SLO definition you were setting up. The error states that the filter string provided for the SLO is not correctly formatted.
To resolve the error and successfully define the SLO, you may try to adjust the filter string so that it correctly parses to 1 resource type.