Infrastructure Deprovisioning¶

Deprovisioning is removing the Kubernetes cluster and all associated resources when they are no longer needed. This includes uninstalling GitOps tools (ArgoCD and Argo Workflows), destroying infrastructure resources, and cleaning up configuration files.

The deprovisioning process should be performed in reverse order of provisioning:

Remove All Instances: Delete all Open edX instances first
Uninstall GitOps Tools: Remove ArgoCD and Argo Workflows
Destroy Infrastructure: Destroy Kubernetes cluster and supporting resources using Terraform/OpenTofu
Clean Up Configuration: Remove local cluster configuration files

Warning: Deprovisioning is a destructive operation that permanently removes all resources and data. Ensure you have backups of any important data before proceeding.

Prerequisites¶

Before deprovisioning infrastructure, ensure:

All Instances Removed: All Open edX instances have been deleted
Backups Verified: Any required backups have been created and verified
Access Credentials: Valid credentials for cloud provider and Kubernetes cluster
Terraform/OpenTofu State: Access to Terraform state file (for infrastructure destruction)

Deprovisioning Steps¶

Step 1: Verify No Active Instances¶

Before deprovisioning infrastructure, ensure all Open edX instances have been removed:

# List all namespaces (excluding system namespaces)
kubectl get namespaces --field-selector metadata.name!=kube-system,metadata.name!=kube-public,metadata.name!=kube-node-lease

# Check for ArgoCD applications
kubectl get applications -n argocd

# Verify no instance namespaces exist
kubectl get namespaces | grep -v "kube-\|argocd\|argo\|default"

If any instances remain, delete them first using launchpad_delete_instance, the GitHub Actions workflow, or manually.

Step 2: Uninstall ArgoCD and Argo Workflows¶

Remove the GitOps tools from the cluster. While there's no dedicated uninstall command, you can remove them manually:

Uninstall ArgoCD:

# Delete ArgoCD namespace (this removes all ArgoCD resources)
kubectl delete namespace argocd

# Wait for namespace to be fully deleted
kubectl wait --for=delete namespace/argocd --timeout=300s

Uninstall Argo Workflows:

# Delete workflow templates first
kubectl delete clusterworkflowtemplates --all

# Delete Argo Workflows namespace
kubectl delete namespace argo

# Wait for namespace to be fully deleted
kubectl wait --for=delete namespace/argo --timeout=300s

Verify Removal:

# Verify namespaces are gone
kubectl get namespace argocd  # Should return "not found"
kubectl get namespace argo    # Should return "not found"

# Verify no ArgoCD resources remain
kubectl get all -n argocd  # Should return "not found"

# Verify no Argo Workflows resources remain
kubectl get all -n argo  # Should return "not found"

Note: If namespaces are stuck in "Terminating" state, see the Troubleshooting section.

Step 3: Clean Up Remaining Resources¶

Remove any remaining resources that might prevent infrastructure destruction:

We are listing the PVs and PVCs in order to keep a record of what was used by the cluster. In case the cluster deletion goes sideways, we have easier job identifying dangling resources.

# List persistent volumes
kubectl get pv

# List persistent volume claims
kubectl get pvc --all-namespaces

# List storage classes
kubectl get storageclass

# List custom resource definitions
kubectl get crd

Delete selective resources (replace <name> and <namespace> with actual values):

# Delete a persistent volume claim (typically required before deleting its bound PV)
kubectl delete pvc <pvc-name> -n <namespace>

# Delete a persistent volume
kubectl delete pv <pv-name>

# Delete a storage class (only if not in use)
kubectl delete storageclass <storageclass-name>

# Delete a custom resource definition (removes the CRD and all instances of that resource)
kubectl delete crd <crd-name>

Remove Custom Resources (if needed):

# Remove ArgoCD CRDs (if not removed with namespace)
kubectl delete crd applications.argoproj.io
kubectl delete crd application sets.argoproj.io
kubectl delete crd appprojects.argoproj.io

# Remove Argo Workflows CRDs (if not removed with namespace)
kubectl delete crd clusterworkflowtemplates.argoproj.io
kubectl delete crd cronworkflows.argoproj.io
kubectl delete crd workflows.argoproj.io
kubectl delete crd workflowtemplates.argoproj.io
kubectl delete crd workfloweventbindings.argoproj.io

Step 4: Destroy Infrastructure¶

Destroy the infrastructure using Terraform or OpenTofu. This will remove the Kubernetes cluster and all associated cloud resources.

Navigate to Infrastructure Directory:

cd launchpad-production-cluster/infrastructure-aws  # or infrastructure-digitalocean

Configure Backend Credentials:

See the provisioning guide on how to setup the backend credentials.

Review Destruction Plan:

# Review what will be destroyed
tofu plan -destroy

Destroy Infrastructure:

# Destroy all infrastructure
tofu destroy

What Gets Destroyed:

Kubernetes Cluster: EKS or DOKS cluster
Node Groups: All worker nodes
Databases: Managed MySQL and MongoDB instances
Storage: S3 buckets or DigitalOcean Spaces
Networking: Load balancers, VPC resources (depending on configuration)
Other Resources: Any other resources created by Terraform modules

Important Notes:

Some resources may take time to destroy (especially databases and storage)
External databases and storage will be destroyed
Review the destruction plan carefully before confirming

Step 5: Clean Up Local Configuration¶

After infrastructure is destroyed, you can optionally remove local cluster configuration:

Remove Cluster Directory:

# Navigate to parent directory
cd ../..

# Remove cluster configuration directory
rm -rf launchpad-production-cluster

Remove kubeconfig:

# Remove cluster-specific kubeconfig
rm ~/.kube/config-cluster

# Or remove from main kubeconfig if merged
kubectl config delete-context <cluster-context-name>
kubectl config delete-cluster <cluster-name>

Note: You may want to keep the cluster configuration directory for reference or to recreate the cluster later.

Troubleshooting¶

Namespace Stuck in Terminating State¶

If ArgoCD or Argo Workflows namespaces are stuck in "Terminating" state:

Check for Blocking Resources:

# List all resources in the namespace
kubectl api-resources --verbs=list --namespaced -o name | \
  xargs -n 1 kubectl get --show-kind --ignore-not-found -n argocd

# Check for finalizers
kubectl get namespace argocd -o yaml | grep finalizers

Force Remove Finalizers:

# Edit namespace to remove finalizers
kubectl patch namespace argocd \
  -p '{"metadata":{"finalizers":[]}}' \
  --type=merge

Force Delete Resources:

If specific resources are blocking deletion:

# Find resources with finalizers
kubectl get all -n argocd -o yaml | grep -A 5 finalizers

# Remove finalizers from specific resource
kubectl patch <resource-type> <resource-name> -n argocd \
  -p '{"metadata":{"finalizers":[]}}' \
  --type=merge

Infrastructure Destruction Issues¶

Terraform/OpenTofu Errors:

State Lock: If state is locked, check for other running Terraform processes
Resource Dependencies: Some resources may have dependencies preventing deletion
Provider Timeouts: Large resources may take longer than default timeouts

Common Solutions:

# Force unlock state (use with caution)
tofu force-unlock <lock-id>

# Destroy specific resources first
tofu destroy -target=<resource-address>

# Increase timeout for slow resources
export TF_CLI_ARGS="-timeout=30m"
tofu destroy

Resources Not Destroyed:

Some resources may not be destroyed if:

They're managed outside of Terraform
They have deletion protection enabled
They're shared resources used by other clusters

Manually remove these resources through the cloud provider console if needed.

Kubernetes Cluster Access Issues¶

Cannot Access Cluster During Destruction:

If the cluster API server is already destroyed, you cannot use kubectl commands
Some resources may be destroyed automatically by the cloud provider
Check cloud provider console for remaining resources

Orphaned Resources:

After cluster destruction, some resources may remain:

Load balancers
Persistent volumes (if not properly cleaned up)
Security groups
IAM roles and policies

Manually clean these up through the cloud provider console.

State File Issues¶

State File Not Found:

Verify backend configuration is correct
Check that state file exists in the backend storage
Ensure backend credentials have read access

State File Corruption:

# Backup state file first
tofu state pull > state-backup.json

# Try to refresh state
tofu refresh

# If refresh fails, you may need to import resources or recreate state

Partial Destruction¶

If destruction partially succeeds:

Review Remaining Resources: Check what resources still exist
Manual Cleanup: Remove remaining resources through cloud provider console
Update State: Use tofu state rm to remove destroyed resources from state
Retry Destruction: Run tofu destroy again to clean up remaining resources

Data Recovery¶

If you need to recover data after deprovisioning:

Backups: Check if Velero or other backup solutions created backups
Database Snapshots: Some cloud providers maintain database snapshots
Storage Buckets: If buckets weren't destroyed, data may still be accessible
Volume Snapshots: Check for volume snapshots in your cloud provider

Note: Recovery may not be possible if backups weren't configured or have been deleted.

Best Practices¶

Before Deprovisioning¶

Create Backups: Ensure all important data is backed up
Document Configuration: Save cluster configuration for future reference
Verify Dependencies: Check that no other systems depend on this infrastructure

During Deprovisioning¶

Follow Order: Remove instances → Uninstall tools → Destroy infrastructure
Monitor Progress: Watch for errors and address them promptly
Verify Removal: Confirm resources are actually destroyed, not just marked for deletion
Keep Logs: Save logs of the deprovisioning process for troubleshooting

After Deprovisioning¶

Verify Cleanup: Check cloud provider console for any remaining resources
Update Documentation: Note that infrastructure has been deprovisioned
Archive Configuration: Keep cluster configuration in version control for reference
Review Costs: Verify that cloud provider billing reflects the destroyed resources

Next Steps¶

After successfully deprovisioning infrastructure:

Verify Billing: Confirm cloud provider billing reflects destroyed resources
Archive Configuration: Keep cluster configuration for future reference
Update Documentation: Document that the cluster has been deprovisioned
Clean Up Credentials: Rotate or remove any credentials that were used for this cluster

Infrastructure Overview - Core components and architecture
Provisioning - How infrastructure is provisioned
Cluster Overview - Cluster operations
Instance Deprovisioning - Deleting Open edX instances