Prometheus is unreachable

If a Prometheus instance installed with osm-edge can’t be reached, perform the following steps to identify and resolve any issues.

  1. Verify a Prometheus Pod exists.

    When installed with osm install --set=osm.deployPrometheus=true, a Prometheus Pod named something like osm-prometheus-5794755b9f-rnvlr should exist in the namespace of the other osm-edge control plane components which named osm-system by default.

    If no such Pod is found, verify the osm-edge Helm chart was installed with the osm.deployPrometheus parameter set to true with helm:

    $ helm get values -a <mesh name> -n <osm-edge namespace>
    

    If the parameter is set to anything but true, reinstall osm-edge with the --set=osm.deployPrometheus=true flag on osm install.

  2. Verify the Prometheus Pod is healthy.

    The Prometheus Pod identified above should be both in a Running state and have all containers ready, as shown in the kubectl get output:

    $ # Assuming osm-edge is installed in the osm-system namespace:
    $ kubectl get pods -n osm-system -l app=osm-prometheus
    NAME                              READY   STATUS    RESTARTS   AGE
    osm-prometheus-5794755b9f-67p6r   1/1     Running   0          27m
    

    If the Pod is not showing as Running or its containers ready, use kubectl describe to look for other potential issues:

    $ # Assuming osm-edge is installed in the osm-system namespace:
    $ kubectl describe pods -n osm-system -l app=osm-prometheus
    

    Once the Prometheus Pod is found to be healthy, Prometheus should be reachable.

Metrics are not showing up in Prometheus

If Prometheus is found not to be scraping metrics for any Pods, perform the following steps to identify and resolve any issues.

  1. Verify application Pods are working as expected.

    If workloads running in the mesh are not functioning properly, metrics scraped from those Pods may not look correct. For example, if metrics showing traffic to Service A from Service B are missing, ensure the services are communicating successfully.

    To help further troubleshoot these kinds of issues, see the traffic troubleshooting guide.

  2. Verify the Pods whose metrics are missing have an Pipy sidecar injected.

    Only Pods with an Pipy sidecar container are expected to have their metrics scraped by Prometheus. Ensure each Pod is running a container from an image with flomesh/pipy in its name:

    $ kubectl get po -n <pod namespace> <pod name> -o jsonpath='{.spec.containers[*].image}'
    mynamespace/myapp:v1.0.0 flomesh/pipy:0.50.0
    
  3. Verify the proxy’s endpoint being scraped by Prometheus is working as expected.

    Each Pipy proxy exposes an HTTP endpoint that shows metrics generated by that proxy and is scraped by Prometheus. Check to see if the expected metrics are shown by making a request to the endpoint directly.

    For each Pod whose metrics are missing, use kubectl to forward the Pipy proxy admin interface port and check the metrics:

    $ kubectl port-forward -n <pod namespace> <pod name> 15000
    

    Go to http://localhost:15000/stats/prometheus in a browser to check the metrics generated by that Pod. If Prometheus does not seem to be accounting for these metrics, move on to the next step to ensure Prometheus is configured properly.

  4. Verify the intended namespaces have been enrolled in metrics collection.

    For each namespace that contains Pods which should have metrics scraped, ensure the namespace is monitored by the intended osm-edge instance with osm mesh list.

    Next, check to make sure the namespace is annotated with openservicemesh.io/metrics: enabled:

    $ # Assuming osm-edge is installed in the osm-system namespace:
    $ kubectl get namespace <namespace> -o jsonpath='{.metadata.annotations.openservicemesh\.io/metrics}'
    enabled
    

    If no such annotation exists on the namespace or it has a different value, fix it with osm:

    $ osm metrics enable --namespace <namespace>
    Metrics successfully enabled in namespace [<namespace>]
    
  5. If custom metrics are not being scraped, verify they have been enabled.

    Custom metrics are currently disable by default and enabled when the osm.featureFlags.enableWASMStats parameter is set to true. Verify the current osm-edge instance has this parameter set for a mesh named <osm-mesh-name> in the <osm-namespace> namespace:

    $ helm get values -a <osm-mesh-name> -n <osm-namespace>
    

    Note: replace <osm-mesh-name> with the name of the osm mesh and <osm-namespace> with the namespace where osm was installed.

    If osm.featureFlags.enableWASMStats is set to a different value, reinstall osm-edge and pass --set osm.featureFlags.enableWASMStats to osm install.