Kubernetes logging as easy as 1..2..3!

Wercker is a Docker-Native CI/CD Automation platform for Kubernetes & Microservice Deployments

Faiq Raza
Faiq Raza
January 16, 2017

If you're considering or have already started using Kubernetes, one thing you absolutely cannot get around is having proper logging throughout your system to debug your applications in case of unexpected errors. 

 At Wercker, we use 3 technologies to set up our logging infrastructure:

  • AWS ElasticSearch Service to quickly and easily run logging aggregation infrastructure with minimal overhead above what we already manage
  • Fluentd to collect and aggregate our logs
  • Last but certainly not least, Wercker to bring everything together and deploy our brand spanking new logging infrastructure!

Let's walk through the steps we're going to take to do this. First thing we need to do is set up our ElasticSearch Service on AWS.

  1. Log on to the AWS console and click ElasticSearch service
  2. Click "Create a new domain"
  3. Configure your instance type, disk, and permissions. For this tutorial, we'll be using IAM based credentials for security. If you're not familiar with IAM users, you can read more about it here. Here's an example config that I made specifically for this tutorial.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::694210971302:user/wercker-demo-ess"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-east-1:694210971302:domain/test-demo/*"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-east-1:694210971302:domain/test-demo/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "XXX.XX.XXX.XX/32"
        }
      }
    }
  ]
}

This also allows us to reach the server from our office IP, so we can use Kibana.

  1. Boom! You're done!

Next, we're going configure Fluentd to gather the logs that we're interested in looking at and deliver them to our brand new ElasticSearch machine. All of this is configured through the td-agent configuration file. For this tutorial, we're going to modify the configuration in the Kubernetes repository for the fluentd plugin. Here's the configuration that we use at Wercker.

# Do not directly collect fluentd's own logs to avoid infinite loops.
<match fluent.**>
  type null
</match>


# Example:
# {"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}
<source>
  type tail
  path /var/log/containers/*.log
  pos_file /var/log/es-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag kubernetes.*
  format json
  read_from_head true
</source>

# Example:
# 2015-12-21 23:17:22,066 [salt.state       ][INFO    ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081

# Example:
# Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
<source>
  type tail
  format syslog
  path /var/log/startupscript.log
  pos_file /var/log/es-startupscript.log.pos
  tag startupscript
</source>

# Examples:
# time="2016-02-04T06:51:03.053580605Z" level=info msg="GET /containers/json"
# time="2016-02-04T07:53:57.505612354Z" level=error msg="HTTP Error" err="No such image: -f" statusCode=404
<source>
  type tail
  format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  path /var/log/docker.log
  pos_file /var/log/es-docker.log.pos
  tag docker
</source>

# Example:
# 2016/02/04 06:52:38 filePurge: successfully removed file /var/etcd/data/member/wal/00000000000006d0-00000000010a23d1.wal
<source>
  type tail
  # Not parsing this, because it doesn't have anything particularly useful to
  # parse out of it (like severities).
  format none
  path /var/log/etcd.log
  pos_file /var/log/es-etcd.log.pos
  tag etcd
</source>

# Multi-line parsing is required for all the kube logs because very large log
# statements, such as those that include entire object bodies, get split into
# multiple lines by glog.

# Example:
# I0204 07:32:30.020537    3368 server.go:1048] POST /stats/container/: (13.972191ms) 200 [[Go-http-client/1.1] 10.244.1.3:40537]
<source>
  type tail
  format multiline
  format_firstline /^\w\d{4}/
  format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/kubelet.log
  pos_file /var/log/es-kubelet.log.pos
  tag kubelet
</source>

# Example:
# I0204 07:00:19.604280       5 handlers.go:131] GET /api/v1/nodes: (1.624207ms) 200 [[kube-controller-manager/v1.1.3 (linux/amd64) kubernetes/6a81b50] 127.0.0.1:38266]
<source>
  type tail
  format multiline
  format_firstline /^\w\d{4}/
  format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/kube-apiserver.log
  pos_file /var/log/es-kube-apiserver.log.pos
  tag kube-apiserver
</source>

# Example:
# I0204 06:55:31.872680       5 servicecontroller.go:277] LB already exists and doesn't need update for service kube-system/kube-ui
<source>
  type tail
  format multiline
  format_firstline /^\w\d{4}/
  format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/kube-controller-manager.log
  pos_file /var/log/es-kube-controller-manager.log.pos
  tag kube-controller-manager
</source>

# Example:
# W0204 06:49:18.239674       7 reflector.go:245] pkg/scheduler/factory/factory.go:193: watch of *api.Service ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [2578313/2577886]) [2579312]
<source>
  type tail
  format multiline
  format_firstline /^\w\d{4}/
  format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/kube-scheduler.log
  pos_file /var/log/es-kube-scheduler.log.pos
  tag kube-scheduler
</source>

<filter kubernetes.var.log.containers.**>
  type kubernetes_metadata
</filter>

<match **>
  type aws-elasticsearch-service
  type_name "access_log"
  log_level info
  logstash_format true
  include_tag_key true
  flush_interval 60s

  buffer_type memory
  buffer_chunk_limit 256m
  buffer_queue_limit 128

  <endpoint>
    url https://${AWS_ES_ENDPOINT}
    region ${AWS_ES_REGION}
    access_key_id ${AWS_ACCESS_KEY_ID}
    secret_access_key ${AWS_SECRET_ACCESS_KEY}
  </endpoint>
</match>

As you can see, a lot of the configuration is similar to what is provided in the Kuberenetes repository, except for this bit here:

<match **>
  type aws-elasticsearch-service
  type_name "access_log"
  log_level info
  logstash_format true
  include_tag_key true
  flush_interval 60s

  buffer_type memory
  buffer_chunk_limit 256m
  buffer_queue_limit 128

  <endpoint>
    url https://${AWS_ES_ENDPOINT}
    region ${AWS_ES_REGION}
    access_key_id ${AWS_ACCESS_KEY_ID}
    secret_access_key ${AWS_SECRET_ACCESS_KEY}
  </endpoint>
</match>

We're specifying 2 things here:

  1. We want to use a memory buffer for fluentd
  2. We want to use aws-elasticsearch-service along with the proper IAM credentials, url, and region that fluentd needs in order to be able to reach our ElasticSearch machine.

We also need to properly configure our Kubernetes deployment file. Here is the file we're using at Wercker:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: loghard-fluentd
  namespace: devops
  labels:
    app: loghard-fluentd
spec:
  template:
    metadata:
      labels:
        app: loghard-fluentd
    spec:
      imagePullSecrets:
        - name: quay-readonly
      containers:
        - name: loghard-fluentd
          imagePullPolicy: Always
          image: quay.io/wercker/loghard-fluentd:${WERCKER_GIT_BRANCH}-${WERCKER_GIT_COMMIT}-${DEPLOY_ENV}
          command: ["fluentd"]
          ports:
            - containerPort: 24224
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
          resources:
            limits:
              memory: 200Mi
              cpu: 100m
      terminationGracePeriodSeconds: 30
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

Now that we have that all set up with our templates, we need to fill them with the correct values. To do this we'll set up a Wercker pipeline called build that can fill the values with Environment variables. The pipeline is pretty straightforward. We add our values then forward the files along for the next pipelines by moving them to the WERCKER_OUTPUT_DIR.

build:
  box:
    id: alpine
    cmd: /bin/sh
  steps:
    - bash-template:
        cwd: deployment
    - script:
        name: forward config
        code: |
          mv deployment/td-agent.conf $WERCKER_OUTPUT_DIR
          mv deployment/loghard-fluentd-deploy.yml $WERCKER_OUTPUT_DIR
          cat $WERCKER_OUTPUT_DIR/loghard-fluentd-deploy.yml

We need to build a proper container for fluentd to use our config, and for that we'll use everybody's favorite build tool, Wercker!

To do this, let's set up a Wercker pipeline called fluentd-box:

fluentd-box:
  box:
    id: alpine:3.4
    cmd: /bin/sh
  steps:
    - script:
        name: install fluentd
        code: |
          apk --no-cache --update add \
                            build-base \
                            ca-certificates \
                            ruby \
                            ruby-irb \
                            ruby-dev \
                            curl \
                            make \
                            g++

          echo 'gem: --no-document' >> /etc/gemrc
          gem install oj
          gem install fluentd -v 0.12.29
          rm -rf /tmp/* /var/tmp/* /var/cache/apk/*
          adduser -D -g '' -u 1000 -h /home/fluent fluent

    - script:
        name: install plugins
        code: |
          fluent-gem install fluent-plugin-aws-elasticsearch-service
          fluent-gem install --no-document fluent-plugin-kubernetes_metadata_filter -v 0.24.0

    - script:
        name: move binary and conf
        code: |
          mkdir -p /etc/fluent
          mkdir -p /var/fluentd/buffer/
          mv td-agent.conf /etc/fluent/fluent.conf

    - script:
        name: prune
        code: |
          rm -rf /pipeline

    - internal/docker-push:
        username: $DOCKER_USERNAME
        password: $DOCKER_PASSWORD
        repository: quay.io/wercker/fluentd-box
        registry: https://quay.io
        tag: $WERCKER_GIT_BRANCH-$WERCKER_GIT_COMMIT-$DEPLOY_ENV

There's a lot going on in this pipeline, so let's break it down step by step. In the first step we're installing fluentd's dependencies, mostly ruby and a few other tools that the install process is going to use. Next, we install fluentd. Finally, we clean up the container and add a user for the fluentd container.

Now we install the plugins that are going to allow us to communicate with AWS and add kubernetes metadata (labels, pod names, etc) to our logs.

With the next step, we create the directories and files for fluentd.

Finally, we remove the wercker specific pipeline folder so that our container image has the information it needs and push it up to our registry using the Wercker internal/docker-push step.

Now that we have a container configured to run fluentd our way, we deploy it to our Kubernetes cluster, and to do this we use...you guessed it... A Wercker pipeline! Let's call this one deploy

deploy-kube-fluentd:
  box:
    id: alpine
    cmd: /bin/sh
  steps:
    - kubectl:
        server: $KUBE_ENDPOINT
        token: $KUBE_TOKEN
        insecure-skip-tls-verify: true
        validate: false
        command: delete daemonset loghard-fluentd --namespace devops || true
    - kubectl:
        server: $KUBE_ENDPOINT
        token: $KUBE_TOKEN
        insecure-skip-tls-verify: true
        command: --validate=false apply -f loghard-fluentd-deploy.yml

Here we're using the kubectl step to interact with Kubernetes. We delete the existing daemon set and then create the new version of the DaemonSet.

Wham! Now you'll start seeing logs from your system!

 

Like Wercker?

We’re hiring! Check out the careers page for open positions in Amsterdam, London and San Francisco.

As usual, if you want to stay in the loop follow us on twitter @wercker or hop on our public slack channel. If it’s your first time using Wercker, be sure to tweet out your #greenbuilds, and we’ll send you some swag!

 

Topics: Kubernetes, Containers