Skip to content

Launchpad Documentation

Monitoring

Instance Monitoring¶

This page describes monitoring options for Open edX instances running on the cluster.

Overview¶

Instance health and metrics can be observed via:

Kubernetes - Pod status, restarts, resource usage
Prometheus - If the cluster has Prometheus and scrape configs for instance namespaces
Grafana - Dashboards built on Prometheus (or other datasources)
Application health - LMS/CMS endpoints, Celery workers, and custom checks

Kubernetes¶

Basic health:

kubectl get pods -n <instance-name>
kubectl describe pod -n <instance-name> <pod-name>
kubectl top pods -n <instance-name>

Check events and readiness/liveness probe status in kubectl describe.

Prometheus and Grafana¶

If the cluster has Prometheus and Grafana enabled (see Cluster Monitoring):

Ensure ServiceMonitors or Prometheus scrape configs include the instance namespace and relevant services.
Use or create Grafana dashboards for Open edX (e.g. request rate, latency, Celery queue length, Redis).

Metrics depend on what is exposed (e.g. Django metrics, Redis, Celery) and how Prometheus is configured to scrape them.

Alerts¶

Define alerts in Alertmanager (or your alerting system) for:

Pod crashes or not ready
High error rate or latency
Queue backlog (Celery)
Database or Redis connectivity

Configure receivers (email, Slack, etc.) in the cluster’s Alertmanager configuration.

Instances Overview - Instance lifecycle
Cluster Monitoring - Cluster-level Prometheus and Grafana
Configuration - Instance and app settings
Debugging - Troubleshooting instance issues

See Also¶

Tracking Logs - Accessing logs
Logging - Log aggregation
Infrastructure Overview - Harmony and monitoring
Auto-scaling - HPA and metrics