Storing GitHub Org Auditlogs in Elasticsearch
I had a need to generate an alert when someone overrode a Branch Protection setting. To do this I decided to pull some of the GitHub Auditlog into Elasticsearch.
Theres a GitHub API client written in sh
, called ok.sh
, which can be found here. At the time it didn’t support querying the Org Auditlog, so I PR’d that here.
Once the PR was in place, I wrote a Dockerfile to create a container to deploy on Kubernetes.
FROM alpine:3.12
RUN apk add --no-cache curl jq
COPY ok.sh/ok.sh /
COPY submit.sh /
CMD ["/ok.sh"]
submit.sh
is a small bit of shell to actually submit the results to an Elasticsearch instance via Logstash, and add some ECS style fields (such as event.*) using jq
, and nest the original auditlog under the github.* object. It then use curl
to POST each line to Logstash.
#!/bin/sh
JSONL=$(cat /auditlog.log | jq '.[]' | jq -c '{ "event": { "kind":"event", "category":"configuration", "type":"change", "module":"github", "dataset":"github.auditlog", "provider":"auditlog" }, "github": { "auditlog": . }}')
printf "%s" "$JSONL" |
while IFS= read -r line
do
echo $line | /usr/bin/curl -H "Content-type: application/json" ${LOGSTASH_ENDPOINT} --data-binary @-
done
Note: You’ll want to create an index template and mappings, but I won’t get into that here
I then created a Kubernetes CronJob, to deploy using Helmfile
resources:
- apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cron-ghauditlog
spec:
schedule: "5 * * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
app: cron-ghauditlog
spec:
nodeSelector:
kubernetes.io/os: linux
imagePullSecrets:
- name: <SECRET>
containers:
- name: ghaudit
image: <IMAGE>
command:
- 'sh'
- '-c'
- '/ok.sh -jv org_auditlog <ORGNAME> phrase="created:>=$(date +%Y-%m-%d)" >> /auditlog.log && /submit.sh'
env:
- name: GITHUB_TOKEN
value: <TOKEN>
- name: LOGSTASH_ENDPOINT
value: <ENDPOINT>
restartPolicy: OnFailure
Each time the k8s CronJob runs, it pulls the full days worth of auditlogs, writes it into a file /auditlog.log
which is then read by submit.sh
and submitted to Logstash.
Logstash only has one filter, to parse the Auditlog timestamp into the required @timestamp
field.
Logstash filter:
filter {
date {
match => [ "[github][auditlog][@timestamp]", "UNIX_MS", "UNIX" ]
}
}
The last piece needed was to make sure we didn’t end up with loads of duplicate documents in Elasticsearch. Because I’m pulling all the data for the current day repeatedly, I get duplicate events every time, but because they comes with a predictable document id direct from GitHub, I just re-use that ID as my Elasticsearch Document ID, and take advantage of Logstash upsert features.
elasticsearch {
hosts => [ "<HOST>" ]
manage_template => false
index => "<INDEX>"
document_id => "%{[github][auditlog][_document_id]}"
doc_as_upsert => true
action => "update"
user => "<USER>"
password => "<PASSWORD>"
}
With the GitHub auditlog now being stored in Elasticsearch, I can create an appropriate ILM to manage the data lifecycle and retain it for as long as I want.