Elevated service errors

Symptom

A CustomerElevatedServiceErrors alert is firing.

It indicates a service is showing a high level of errors (5xx responses) from a service, specifically:

  • if a service is running on a production cluster AND

  • the service is receiving more than 6 requests / minute AND

  • more than 10% of the responses from the service are 5xx for the last 10 minutes

These alerts apply to any service used by an ingress - Magnolia service, frontend service, redirect service or other service - running on a production cluster.
CustomerElevatedServiceErrors alerts are sent to subscribers via email.

Observations

Here are the details on the alert:

Alert: CustomerElevatedServiceErrors

Table 1. CustomerElevatedServiceErrors details

Expression

Delay

10 minutes

Labels

team: customer

Annotations

summary, description, tenant, cluster_id, cluster_name, namespace, service

Expression

service:requests:rate15m{k8s_cluster_name=~".+(prod|production)"} > 0.1 AND service:errors:rate15m{k8s_cluster_name=~".+(prod|production)"} / service:requests:rate15m{k8s_cluster_name=~".+(prod|production)"} > 0.1

Delay

15 minutes

Labels

team: customer

Annotations

  • summary

  • service

  • description

  • tenant

  • cluster_id

  • cluster_name

  • namespace

Solutions

This section provides solutions that should help resolve the issue in most cases.

Investigate cause for elevated service errors

Unfortunately there is no easy resolution for elevated service errors alerts; there are many possible causes for service errors.

Possible causes for the elevated service errors:

  • The ingress is misconfigured: service selector does not find the expected service (check the ingress configuration)

  • No pods for the service are running: pods are restarting or crashlooping (check the pod status with kubectl or Rancher)

  • No pods for the service are running: deployment / daemonset / statefulset is scaled down to 0 pods (check the deployment / daemonset / statefulset with kubectl or Rancher)

  • Pods are running but are returning errors (do this as a last resort! Logs are often verbose and are difficult to interpret)

Feedback

PaaS

×

Location

This widget lets you know where you are on the docs site.

You are currently perusing through the DX Cloud PaaS docs.

Main doc sections

DX Core Headless PaaS Legacy Cloud Incubator modules