Instance Debugging¶
This page summarizes common issues and debugging steps for Open edX instances on the cluster.
Pod Not Starting or Crashing¶
- Describe the pod -
kubectl describe pod -n <instance-name> <pod-name>for events, restarts, and probe failures. - Logs -
kubectl logs -n <instance-name> <pod-name> --previousif the container has restarted. - Image pull - Ensure the image exists in the registry and that
imagePullSecretsare set (Launchpad configures registry credentials for the namespace). - Init jobs - Drydock init jobs (e.g. migrations) run before the main app; check init container logs if the main container never starts.
ArgoCD Out of Sync¶
- In ArgoCD UI, check the application status and diff.
- Common causes: manifest errors, missing ConfigMaps/Secrets, or source repo not updated.
- Fix manifests in the cluster repo and sync, or resolve reported errors (e.g. missing image, wrong namespace).
Database or Storage Connectivity¶
- Verify Secrets in the instance namespace contain the correct credentials (e.g. MySQL, MongoDB, S3).
- Check that provision workflows completed and that the instance config (and Tutor env) point to the right hosts and bucket names.
- Test connectivity from a pod:
kubectl exec -n <instance-name> <pod-name> -- ...and run a simple client (e.g.mysql,mongosh) or curl to the storage endpoint.
Image Not Found or Pull Errors¶
- Confirm the image name and tag in
config.ymlmatch what was built and pushed (e.g. by Picasso). - Ensure the namespace has the correct
imagePullSecrets(Launchpad sets this on instance creation and duringlaunchpad_install_argo). - Check registry credentials (e.g.
LAUNCHPAD_DOCKER_REGISTRY_CREDENTIALS) in the environment that created the instance or runs the workflows.
Slow or Failing Sync¶
- Check ArgoCD application sync status and any error messages.
- Ensure the source path in the Application manifest exists and has valid Kubernetes YAML.
- For large repos, sync can take longer; check ArgoCD logs if needed.
Getting More Help¶
- Collect:
kubectl describeandkubectl logsfor failing pods, ArgoCD application status, and relevant config snippets. - See Provisioning and Deprovisioning for workflow-related issues.
- See Configuration for config and secrets.
Related Documentation¶
- Instances Overview - Instance lifecycle
- Provisioning - Instance creation and workflows
- Deprovisioning - Instance deletion and cleanup
- Configuration - Config and secrets
See Also¶
- Tracking Logs - Viewing pod logs
- Instance Monitoring - Metrics and health
- Infrastructure Overview - ArgoCD and Argo Workflows
- User Guides Overview - Task-based guides