Working with Kubernetes

How to access and work with production environments based on Kuberenetes

Problem

You want to access and work with any of the Kubernetes-based environments used for deployment in cee/deployment-all.

Setup

The easiest way to get started with a Kubernetes environment is to access it via the URLs of the Web console.

You can also manage Kubernetes resources using the command line tools. On Fedora, install the tools with: dnf install origin-clients.

There are two options for authentication:

From the Web console, find your name at the top right and click it. Choose Copy login command from the drop down and paste the command into your terminal. You will be logged in automatically with a one time token.
Install and run ocp-sso-token.

OpenShift peculiarities to keep in mind

OpenShift has some strict rules for containers to maintain security:

No packages can be installed once the container is running. All packages that you need for the container must be installed into the container image itself.
You cannot be root inside the container and your Linux capabilities are highly restricted. For example, ICMP pings and chown are not allowed.
Each container starts with an arbitrary UID/GID pair. The pair is different per project, but constant across invocations. Some applications, like git and ansible have issues with abitrary UID/GID pairs, but there are workarounds for this. See Handling arbitrary UIDs and GIDs below.
The default resource allotments set by the namespace LimitRange might be very low and it might be necessary to explicitly specify how much RAM and CPU the container is allowed. Some applications may work with the defaults, but you may experience strange issues or abrupt container restarts from out of memory errors. See Resource allocation for details.

Watching a running container

You can watch the logs from a deployment or container using the oc command line tools. This can be very helpful if you are rapidly iterating a DeploymentConfig and trying to see if the container runs properly.

Here’s an example for monit: oc logs -f dc/slack-bot. This will tail the logs as the container runs.

Occasionally, the connection will drop between you and OpenShift. You can keep monitoring logs indefinitely by using something like this:

while true; do oc logs --tail 5 -f dc/slack-bot; done

This will force a reconnection each time it disconnects.

Handling abitrary UIDs and GIDs

When a container starts in OpenShift, it is assigned an arbitrary UID/GID pair. This provides additional security for the host underneath the container. However, it can make some application misbehave because calls to id or groups will fail or return strange information.

The following changes are implemented in the CKI project to make these changing UID/GID combinations easier.

Writable /etc/passwd and /etc/group

The cleanup include file used during container image builds ensures that container images have writable /etc/passwd and /etc/group files:

# Make everybody happy again with arbitrary UID/GID in OpenShift
RUN chmod g=u /etc/passwd /etc/group

Current user/group added to /etc/passwd

The default CKI container image entry point script and cronjob template run the following commands very early after container startup to ensure the current user can be found in /etc/passwd:

if [ -w '/etc/passwd' ] && ! id -nu > /dev/null 2>&1; then
    echo "cki:x:$(id -u):$(id -g):,,,:${HOME}:/bin/bash" >> /etc/passwd;
fi

Resource allocation

By default, the namespace LimitRange might set very low default RAM and CPU quotas for each container. Most applications will require higher limits to work properly:

In most cases, caring about Requests and Limits for CPU and memory should be good enough. While requests and limits are specified on a container level, they are used on a Pod level via max(...init containers, sum(containers)).

Limits are strictly enforced, i.e. Pods can never use more resources. For CPU, cgroups are used to limit resource consumption. For memory, Pods are killed when exceeding the specified limit.

Requests are used for scheduling decisions, i.e. the total request for all Pods on a node cannot exceed the available resources on that node. Also keep in mind that some Pods on a node might not specify resource requests at all. For resource-hungry Pods, make sure that nodes are available that have enough resources to run the Pod.

Cron jobs

Recurring jobs are deployed as CronJobs. Cron jobs are not visible in the standard OpenShift Application Console, but can be found in the OpenShift Cluster Console.

To get a list of all cron jobs from the command line, you can use

$ oc get cronjob
NAME                             SCHEDULE    SUSPEND ACTIVE LAST SCHEDULE AGE
acme-update-cluster-routes-daily 40 4 * * *  False   0      11h           63d
cronjobs-acme-certs-daily        30 4 * * *  False   0      11h           35d
cronjobs-acme-patch-remote-daily 40 4 * * *  False   0      11h           35d
...

As an example, the cronjobs-acme-certs-daily CronJob spawns a Job which spawns a Pod with the actual containers. To get a list of everything related to one schedule, you can use something like

$ oc get job,pod -l schedule_job=cronjobs-acme-certs-daily
NAME                                           COMPLETIONS DURATION AGE
job.batch/cronjobs-acme-certs-daily-1630989000 1/1         22s      2d11h
job.batch/cronjobs-acme-certs-daily-1631075400 1/1         22s      35h
job.batch/cronjobs-acme-certs-daily-1631161800 1/1         24s      11h

NAME                                           READY STATUS    RESTARTS AGE
pod/cronjobs-acme-certs-daily-1630989000-rk4cp 0/1   Completed 0        2d11h
pod/cronjobs-acme-certs-daily-1631075400-267zw 0/1   Completed 0        35h
pod/cronjobs-acme-certs-daily-1631161800-2m8wh 0/1   Completed 0        11h

To see the output of a schedule, you can use oc logs with the Job or the Pod like

$ oc logs job.batch/cronjobs-acme-certs-daily-1631161800
...
Checking registration
...
$ oc logs pod/cronjobs-acme-certs-daily-1631161800-2m8wh
...
Checking registration
...