Infrastructure Deprovisioning¶
Deprovisioning is removing the Kubernetes cluster and all associated resources when they are no longer needed. This includes uninstalling GitOps tools (ArgoCD and Argo Workflows), destroying infrastructure resources, and cleaning up configuration files.
The deprovisioning process should be performed in reverse order of provisioning:
- Remove All Instances: Delete all Open edX instances first
- Uninstall GitOps Tools: Remove ArgoCD and Argo Workflows
- Destroy Infrastructure: Destroy Kubernetes cluster and supporting resources using Terraform/OpenTofu
- Clean Up Configuration: Remove local cluster configuration files
Warning: Deprovisioning is a destructive operation that permanently removes all resources and data. Ensure you have backups of any important data before proceeding.
Prerequisites¶
Before deprovisioning infrastructure, ensure:
- All Instances Removed: All Open edX instances have been deleted
- Backups Verified: Any required backups have been created and verified
- Access Credentials: Valid credentials for cloud provider and Kubernetes cluster
- Terraform/OpenTofu State: Access to Terraform state file (for infrastructure destruction)
Deprovisioning Steps¶
Step 1: Verify No Active Instances¶
Before deprovisioning infrastructure, ensure all Open edX instances have been removed:
# List all namespaces (excluding system namespaces)
kubectl get namespaces --field-selector metadata.name!=kube-system,metadata.name!=kube-public,metadata.name!=kube-node-lease
# Check for ArgoCD applications
kubectl get applications -n argocd
# Verify no instance namespaces exist
kubectl get namespaces | grep -v "kube-\|argocd\|argo\|default"
If any instances remain, delete them first using launchpad_delete_instance, the GitHub Actions workflow, or manually.
Step 2: Uninstall ArgoCD and Argo Workflows¶
Remove the GitOps tools from the cluster. While there's no dedicated uninstall command, you can remove them manually:
Uninstall ArgoCD:
# Delete ArgoCD namespace (this removes all ArgoCD resources)
kubectl delete namespace argocd
# Wait for namespace to be fully deleted
kubectl wait --for=delete namespace/argocd --timeout=300s
Uninstall Argo Workflows:
# Delete workflow templates first
kubectl delete clusterworkflowtemplates --all
# Delete Argo Workflows namespace
kubectl delete namespace argo
# Wait for namespace to be fully deleted
kubectl wait --for=delete namespace/argo --timeout=300s
Verify Removal:
# Verify namespaces are gone
kubectl get namespace argocd # Should return "not found"
kubectl get namespace argo # Should return "not found"
# Verify no ArgoCD resources remain
kubectl get all -n argocd # Should return "not found"
# Verify no Argo Workflows resources remain
kubectl get all -n argo # Should return "not found"
Note: If namespaces are stuck in "Terminating" state, see the Troubleshooting section.
Step 3: Clean Up Remaining Resources¶
Remove any remaining resources that might prevent infrastructure destruction:
We are listing the PVs and PVCs in order to keep a record of what was used by the cluster. In case the cluster deletion goes sideways, we have easier job identifying dangling resources.
# List persistent volumes
kubectl get pv
# List persistent volume claims
kubectl get pvc --all-namespaces
# List storage classes
kubectl get storageclass
# List custom resource definitions
kubectl get crd
Delete selective resources (replace <name> and <namespace> with actual values):
# Delete a persistent volume claim (typically required before deleting its bound PV)
kubectl delete pvc <pvc-name> -n <namespace>
# Delete a persistent volume
kubectl delete pv <pv-name>
# Delete a storage class (only if not in use)
kubectl delete storageclass <storageclass-name>
# Delete a custom resource definition (removes the CRD and all instances of that resource)
kubectl delete crd <crd-name>
Remove Custom Resources (if needed):
# Remove ArgoCD CRDs (if not removed with namespace)
kubectl delete crd applications.argoproj.io
kubectl delete crd application sets.argoproj.io
kubectl delete crd appprojects.argoproj.io
# Remove Argo Workflows CRDs (if not removed with namespace)
kubectl delete crd clusterworkflowtemplates.argoproj.io
kubectl delete crd cronworkflows.argoproj.io
kubectl delete crd workflows.argoproj.io
kubectl delete crd workflowtemplates.argoproj.io
kubectl delete crd workfloweventbindings.argoproj.io
Step 4: Destroy Infrastructure¶
Destroy the infrastructure using Terraform or OpenTofu. This will remove the Kubernetes cluster and all associated cloud resources.
Navigate to Infrastructure Directory:
cd launchpad-production-cluster/infrastructure-aws # or infrastructure-digitalocean
Configure Backend Credentials:
See the provisioning guide on how to setup the backend credentials.
Review Destruction Plan:
# Review what will be destroyed
tofu plan -destroy
Destroy Infrastructure:
# Destroy all infrastructure
tofu destroy
What Gets Destroyed:
- Kubernetes Cluster: EKS or DOKS cluster
- Node Groups: All worker nodes
- Databases: Managed MySQL and MongoDB instances
- Storage: S3 buckets or DigitalOcean Spaces
- Networking: Load balancers, VPC resources (depending on configuration)
- Other Resources: Any other resources created by Terraform modules
Important Notes:
- Some resources may take time to destroy (especially databases and storage)
- External databases and storage will be destroyed
- Review the destruction plan carefully before confirming
Step 5: Clean Up Local Configuration¶
After infrastructure is destroyed, you can optionally remove local cluster configuration:
Remove Cluster Directory:
# Navigate to parent directory
cd ../..
# Remove cluster configuration directory
rm -rf launchpad-production-cluster
Remove kubeconfig:
# Remove cluster-specific kubeconfig
rm ~/.kube/config-cluster
# Or remove from main kubeconfig if merged
kubectl config delete-context <cluster-context-name>
kubectl config delete-cluster <cluster-name>
Note: You may want to keep the cluster configuration directory for reference or to recreate the cluster later.
Troubleshooting¶
Namespace Stuck in Terminating State¶
If ArgoCD or Argo Workflows namespaces are stuck in "Terminating" state:
Check for Blocking Resources:
# List all resources in the namespace
kubectl api-resources --verbs=list --namespaced -o name | \
xargs -n 1 kubectl get --show-kind --ignore-not-found -n argocd
# Check for finalizers
kubectl get namespace argocd -o yaml | grep finalizers
Force Remove Finalizers:
# Edit namespace to remove finalizers
kubectl patch namespace argocd \
-p '{"metadata":{"finalizers":[]}}' \
--type=merge
Force Delete Resources:
If specific resources are blocking deletion:
# Find resources with finalizers
kubectl get all -n argocd -o yaml | grep -A 5 finalizers
# Remove finalizers from specific resource
kubectl patch <resource-type> <resource-name> -n argocd \
-p '{"metadata":{"finalizers":[]}}' \
--type=merge
Infrastructure Destruction Issues¶
Terraform/OpenTofu Errors:
- State Lock: If state is locked, check for other running Terraform processes
- Resource Dependencies: Some resources may have dependencies preventing deletion
- Provider Timeouts: Large resources may take longer than default timeouts
Common Solutions:
# Force unlock state (use with caution)
tofu force-unlock <lock-id>
# Destroy specific resources first
tofu destroy -target=<resource-address>
# Increase timeout for slow resources
export TF_CLI_ARGS="-timeout=30m"
tofu destroy
Resources Not Destroyed:
Some resources may not be destroyed if:
- They're managed outside of Terraform
- They have deletion protection enabled
- They're shared resources used by other clusters
Manually remove these resources through the cloud provider console if needed.
Kubernetes Cluster Access Issues¶
Cannot Access Cluster During Destruction:
- If the cluster API server is already destroyed, you cannot use
kubectlcommands - Some resources may be destroyed automatically by the cloud provider
- Check cloud provider console for remaining resources
Orphaned Resources:
After cluster destruction, some resources may remain:
- Load balancers
- Persistent volumes (if not properly cleaned up)
- Security groups
- IAM roles and policies
Manually clean these up through the cloud provider console.
State File Issues¶
State File Not Found:
- Verify backend configuration is correct
- Check that state file exists in the backend storage
- Ensure backend credentials have read access
State File Corruption:
# Backup state file first
tofu state pull > state-backup.json
# Try to refresh state
tofu refresh
# If refresh fails, you may need to import resources or recreate state
Partial Destruction¶
If destruction partially succeeds:
- Review Remaining Resources: Check what resources still exist
- Manual Cleanup: Remove remaining resources through cloud provider console
- Update State: Use
tofu state rmto remove destroyed resources from state - Retry Destruction: Run
tofu destroyagain to clean up remaining resources
Data Recovery¶
If you need to recover data after deprovisioning:
- Backups: Check if Velero or other backup solutions created backups
- Database Snapshots: Some cloud providers maintain database snapshots
- Storage Buckets: If buckets weren't destroyed, data may still be accessible
- Volume Snapshots: Check for volume snapshots in your cloud provider
Note: Recovery may not be possible if backups weren't configured or have been deleted.
Best Practices¶
Before Deprovisioning¶
- Create Backups: Ensure all important data is backed up
- Document Configuration: Save cluster configuration for future reference
- Verify Dependencies: Check that no other systems depend on this infrastructure
During Deprovisioning¶
- Follow Order: Remove instances → Uninstall tools → Destroy infrastructure
- Monitor Progress: Watch for errors and address them promptly
- Verify Removal: Confirm resources are actually destroyed, not just marked for deletion
- Keep Logs: Save logs of the deprovisioning process for troubleshooting
After Deprovisioning¶
- Verify Cleanup: Check cloud provider console for any remaining resources
- Update Documentation: Note that infrastructure has been deprovisioned
- Archive Configuration: Keep cluster configuration in version control for reference
- Review Costs: Verify that cloud provider billing reflects the destroyed resources
Next Steps¶
After successfully deprovisioning infrastructure:
- Verify Billing: Confirm cloud provider billing reflects destroyed resources
- Archive Configuration: Keep cluster configuration for future reference
- Update Documentation: Document that the cluster has been deprovisioned
- Clean Up Credentials: Rotate or remove any credentials that were used for this cluster
Related Documentation¶
- Infrastructure Overview - Core components and architecture
- Provisioning - How infrastructure is provisioned
- Cluster Overview - Cluster operations
- Instance Deprovisioning - Deleting Open edX instances
See Also¶
- Cluster Backup - Backup before deprovisioning
- Cluster Restore - Restore procedures
- Custom Resources - Resources removed during deprovisioning