Syncing the data in the staging instance of DataWarehouse

How to restore a production backup into the staging instance of DataWarehouse

Problem

You want to load fresh data into the staging instance of DataWarehouse.

Steps

  1. Log into the staging OpenShift project on the command line, e.g. by running the command on the “Copy login command” page hidden behind the menu in the top-right corner of the OpenShift web console.

    ocp-sso-token <STAGING_CLUSTER_URL> \
      --namespace cki--external \
      --context mpp_preprod_external
    
  2. Scale the staging DataWarehouse deployment to zero via

    oc --context mpp_preprod_external \
        scale --replicas 0 statefulsets/datawarehouse-webservice-staging
    
  3. Get the name of one of the finished DataWarehouse backup pods, and spawn a temporary debug pod for it via

    pod_name=$(oc --context mpp_preprod_external get pod -o name | grep datawarehouse-backup | head -n 1)
    if [[ -z "$pod_name" ]]; then
      echo "No datawarehouse-backup pod found!" >&2
    else
      oc --context mpp_preprod_external debug "$pod_name"
    fi
    
  4. Modify the bucket configuration in the BUCKET_ARR_CKI_PROD_DW_BACKUPS_STAGING environment variable to point to the production bucket with

    echo "Before: ${BUCKET_ARR_CKI_PROD_DW_BACKUPS_STAGING}"
    BUCKET_ARR_CKI_PROD_DW_BACKUPS_STAGING="${BUCKET_ARR_CKI_PROD_DW_BACKUPS_STAGING%staging/}production/"
    echo "After:  ${BUCKET_ARR_CKI_PROD_DW_BACKUPS_STAGING}"
    
  5. Run cki_deployment_pgsql_restore.sh and select the backup to restore, or use the --latest option to automatically select the latest backup.

  6. Drink some water while you wait. It should take a few seconds to download the backup, but will take about 30 minutes to restore it.

  7. Scale the staging DataWarehouse deployment back up via

    oc --context mpp_preprod_external \
        scale --replicas 2 statefulsets/datawarehouse-webservice-staging