Storing GitHub Org Auditlogs in Elasticsearch

Rob Rankin included in tech github elasticsearch

2022-04-22 426 words 2 minutes

Contents

I had a need to generate an alert when someone overrode a Branch Protection setting. To do this I decided to pull some of the GitHub Auditlog into Elasticsearch.

Theres a GitHub API client written in sh, called ok.sh, which can be found here. At the time it didn’t support querying the Org Auditlog, so I PR’d that here.

Once the PR was in place, I wrote a Dockerfile to create a container to deploy on Kubernetes.

FROM alpine:3.12

RUN apk add --no-cache curl jq

COPY ok.sh/ok.sh /
COPY submit.sh /

CMD ["/ok.sh"]

submit.sh is a small bit of shell to actually submit the results to an Elasticsearch instance via Logstash, and add some ECS style fields (such as event.*) using jq, and nest the original auditlog under the github.* object. It then use curl to POST each line to Logstash.

#!/bin/sh

JSONL=$(cat /auditlog.log | jq '.[]' | jq -c '{ "event": { "kind":"event", "category":"configuration", "type":"change", "module":"github", "dataset":"github.auditlog", "provider":"auditlog"  },   "github": { "auditlog": . }}')

printf "%s" "$JSONL" |
while IFS= read -r line
do
  echo $line | /usr/bin/curl -H "Content-type: application/json" ${LOGSTASH_ENDPOINT} --data-binary @-
done

Note: You’ll want to create an index template and mappings, but I won’t get into that here

I then created a Kubernetes CronJob, to deploy using Helmfile

resources:
- apiVersion: batch/v1beta1
  kind: CronJob
  metadata:
    name: cron-ghauditlog
  spec:
    schedule: "5 * * * *"
    failedJobsHistoryLimit: 1
    successfulJobsHistoryLimit: 1
    jobTemplate:
      spec:
        template:
          metadata:
            labels:
              app: cron-ghauditlog
          spec:
            nodeSelector:
              kubernetes.io/os: linux
            imagePullSecrets:
            - name: <SECRET>
            containers:
            - name: ghaudit
              image: <IMAGE>
              command:
                - 'sh'
                - '-c'
                - '/ok.sh -jv org_auditlog <ORGNAME> phrase="created:>=$(date +%Y-%m-%d)" >> /auditlog.log && /submit.sh'
              env:
              - name: GITHUB_TOKEN
                value: <TOKEN>
              - name: LOGSTASH_ENDPOINT
                value: <ENDPOINT>
            restartPolicy: OnFailure

Each time the k8s CronJob runs, it pulls the full days worth of auditlogs, writes it into a file /auditlog.log which is then read by submit.sh and submitted to Logstash.

Logstash only has one filter, to parse the Auditlog timestamp into the required @timestamp field.

Logstash filter:

filter {
  date {
    match => [ "[github][auditlog][@timestamp]", "UNIX_MS", "UNIX" ]
  }
}

The last piece needed was to make sure we didn’t end up with loads of duplicate documents in Elasticsearch. Because I’m pulling all the data for the current day repeatedly, I get duplicate events every time, but because they comes with a predictable document id direct from GitHub, I just re-use that ID as my Elasticsearch Document ID, and take advantage of Logstash upsert features.

elasticsearch {
  hosts => [ "<HOST>" ]
  manage_template => false
  index => "<INDEX>"
  document_id => "%{[github][auditlog][_document_id]}"
  doc_as_upsert => true
  action => "update"
  user => "<USER>"
  password => "<PASSWORD>"
}

With the GitHub auditlog now being stored in Elasticsearch, I can create an appropriate ILM to manage the data lifecycle and retain it for as long as I want.

Contents

Storing GitHub Org Auditlogs in Elasticsearch

References