Gabriel Vasseur

Feb 28, 202210 min read

ES-Choreographer

This is the documentation for the ES-Choreographer app on splunkbase. This app offers various frameworks to help manage and improve correlation searches in Splunk Enterprise Security:

user-friendly peer review and change tracking system for correlation searches
simple in-splunk task management system
framework to encourage compliance with best practices
morning checks checks stub

The app was demonstrated in SEC1441A at .conf21: "How We Maintain Our Correlations in Splunk Enterprise Security at Thales UK" pdf/mp4

Installation and setup

For installation, see instuctions on splunkbase

Peer Reviews

Every time you review a correlation search, ES Choreographer stores the definition of the search (SPL, earliest, latest, notable details, etc). This allows the app to compute changes not only since the last review, but also between any 2 past reviews or since any past review.

Clicking the "Set for peer review" button on the Status dashboard will list all searches pending reviews, i.e. that are different from the way they were the last time they were reviewed. Clicking on the review_status cell will open the peer review dashboard for that particular search.

Peer Review dashboard

The Peer Review dashboard offers:

some context on the activity since the last review: DONE or TODO comments added or updated as well as a list of confirmed contributors. Contributors are splunk users logged on splunk web that used the "edit correlation search" page in Enterprise Security. Keep in mind there are other ways to change correlations (such as upgrades, deployments, direct edit of the .conf files...) that cannot be attributed this way.
an HTML-rendered coloured diff highlighting exactly what has changed
any change to the Best Practices status for that correlation
current pending TODOs for this correlation and a text box to add more. This can be used to raise review comments.
buttons to either pass or fail the review. The only difference between a failed review and a passed review is that a correlation with a failed review will be listed on the status dashboard when set for peer review even if there was no changes to the correlation. You may also decide to have a schedule search emailing admins about failed reviews.

Note: in an ideal world peer reviews occurr *before* a change is deployed in production. This would be very difficult to achieve in Splunk. All peer reviews are therefore happening AFTER the fact. Better late than never.

Peer Review History dashboard

The Peer Review History dashboard is simple: pick the date of the "before" review you want to consider. Then pick either the date of the "after" review. You can also choose "now" to consider the current state of the correlation. The dashboard will then show you the diff between the two states of that correlation.

Bulk Peer Review dashboard

The Bulk Peer Review dashboard is accessible from the status dashboard by clicking on the "?":

And then clicking on the Bulk Peer Review button.

The Bulk Peer Review dashboard does not show changes and therefore does not allow an actual peer review, but it does allow to set the review status of many correlations at once. This is useful in some unusual circumstances:

the first time you install the app, every single correlation will show as pending a review. You might be happy to put in the work to review them one by one as part of a big push in your development efforts, but most likely you will want to use the bulk peer review to set a starting point and decide to only peer review changes from then on.
if you upgrade Enterprise Security or install or upgrade an app with correlations such as "ES Content Updates", it might show a lot of correlations pending reviews. You might be happy going through them one by one to review changes. But it might be very tedious. If you know your environment, it's usually possible to differentiate between the changes brought by the upgrade / new app and the changes due to your team's active development efforts. The bulk peer review dashboard offers many ways to granularly narrow down the list of pending searches to the ones you are not interested in peer reviewing properly. Then a click of a button will make them no longer pending reviews and the status dashboard will once more be usable.

Task management system

ES Choreographer comes with a basic task management system. This integrates with the peer review system and Incident Review in Enterprise Security.

TODO & DONE dashboard

The TODO & DONE dashboard allows to raise comments against a specific correlation search.

A TODO comment is something to do regarding a correlation search, e.g. "review threshold" or "include field XYZ in notable". TODOs can be cancelled or marked as done. A TODO that hasn't yet been cancelled or marked as done is considered pending. When "set for TODOs" the status dashboard will list all correlations with pending TODOs.

A DONE comment is just an FYI. If a developer does something to a correlation, they might choose to create a DONE comment to explain or justify their change. Depending on your workflow this can be a curtesy to the peer reviewer but is completely optional and cannot be enforced.

Assignments

The TODO & DONE dashboard allows to assign the correlation to a member of your team. TODOs are not assigned, correlations are. If a correlation has 2 TODOs, it is not possible to assign one TODO to a team member and the other TODO to another team member. The dashboard also allows to unassign a correlation, or "kick into the long grass" which assigns it to "later". There is a scheduled search in the background that periodically unassigns correlations that no longer have any pending TODOs.

You will have to edit the source of the dashboard to customise the assignment buttons to your team.

When set for TODOs the Status dashboard will list all correlations with pending TODOs. The owner column shows the user the correlation is assigned to. This gives your team a view of the pending workload. Assigning a correlation to "later" (by kicking it in the long grass) allows to distinguish between correlations with new TODOs that haven't yet been assigned and need to be discussed and correlations that have pending TODOs but have been deemed less of a priority.

IR-linked TODOs

TODOs can be linked to a specific notable in Incident Review. This is particularly helpful if the TODO is relating to an issue demonstrated in a notable. When a TODO is linked to a notable, ES Choreographer's dashboards make it easy to find that notable again at any futher time. This allows to find an example of the problem that needs to be fixed, so it can be assessed for triage of the TODO or reproduced by the developer for testing.

For instance, an analyst could have an issue with a notable: either that particular instance is deemed as noise and should be filtered out, or maybe there was a useful bit of information for their investigation that was buried in raw logs and could be instead included in the notable itself to make next time's investigation or triage much quicker. In this case the user can choose to use the "! add TODO" workflow. This opens the "IR-linked TODOs" dashboard. This dashboards will show existing pending TODOs for that correlation (including the ones linked to IR notables) and allow to add a new TODO that will be linked to that notable.

A TODO can only be linked to one notable, and vice versa. It is not possible to link a TODO to a notable after it has been created. The only way is to create the TODO via the IR workflow.

Note about Risk-Based-Alerting

If you practise RBA, you will have correlations that trigger on generic circumstances such as "high risk user/asset". In some cases the RBA correlation itself needs some work (for instance increase the threshold) and it's OK to raise the linked TODO against that correlation. In many cases however, the problem will lie in one of the contributors (for instance one risk-raising correlation is too noisy and is causing many entities to breach the risk threshold). In these situations, it is better to raise the TODO against the contributor.

The IR-linked TODOs dashboard includes code to deal with this: when a notable is raised by an RBA correlation, it can show the user the list of contributing correlations and the user can select the one they want the TODO to be raised against. Unfortunately, the logic is specific to your own implementation of RBA and the dashboard's code will need customisation to work.

Best Practices

Best practices are a set of desirable criteria that all your correlations should aspire to satisfy. What your best practices are exactly, only you can say. The idea of defining them and more importantly have a dashboard to automatically assess your correlations against them so that you know where you stand is a key concept of ES Choreographer. That said you can also ignore it entirely and just focus on using the peer review and/or TODO framework (or vice versa).

ES Choreographer comes with a set of reasonable best practices to start you with, and you will need to dive under the hood if you want to remove, customise, or add to them.

The Best Practices dashboard lists your correlations and reports on how they measure up against your best practices. Clicking on a line in the table will bring up a lot of information about that particular correlation and how the best practices are evaluated. There is a lot of built-in documentation in the dashboard that you can access by clicking the question marks ("?") in the section headers.

Keyness and exceptions

ES Choreographer introduces the entirely optional notion of keyness for correlations to help prioritise your work. Keyness ranges from 1 (absolutely key) to 4 (not very key at all). For instance, you could agree on a list of 10 or 20 of your most important correlations and give them a keyness of 1 and set yourself the goal to achieve 100% best practices compliance for all correlations with a keyness of 1.

It doesn't always make sense for all best practices to be satisfied by all correlations. In a lot of cases, the best practice evaluation can be made clever enough to automatically recognise this and set the status of the best practice to "N/A" where needed. But in some cases exceptions are needed.

Both keyness and exceptions are defined in a lookup that can be edited either by clicking the "?" at the top of the best practices dashboard and clicking the link or clicking a key cell in the main table on the Best Practices dashboard or the Status dashboard. To create an exception, locate the relevant cell (line is correlation, column is best practice) and simply type "EXEMPT".

Integration

The Status dashboard can also be used to assess the degree of compliance of your correlations with your best practices. The Peer Review dashboard also keeps track of the status of best practices so you can see if changes are affecting the best practices compliance positively or negatively.

Best Practices Evolution

The Best Practices Evolution dashboard keeps track of what the best practices compliance was yesterday evening and shows you if any correlation has seen any improvements or degradation since. It is particularly useful if you are tweaking the macro that evaluates the best practices, effectively moving the goal post, and you want to check the impact of your change.

There is also a scheduled search that reports daily on best practices evolution.

Suggested workflows

There are several levels in which you can use ES Choreographer in your team.

Level 1 - Using peer reviews

This requires almost no setup investment.

Let your developers tweak correlations the way they have always done but add this:

Every morning, open the Status dashboard, click "set for peer review" (this can be bookmarked for convenience). If any correlation is pending review, a click on the review_status cell will open the peer review dashboard for that correlation in a new tab.

You can make this a group activity, to make everyone aware of all changes. Or have a rota with a designated peer reviewer every day/week. Or just encourage every admin to review as time allows. Do not peer review your own changes!

Everybody can learn from the changes others are making and the comments the peer reviewers are giving.

Level 2 - Using TODOs

This requires some setup investment: tweaking the TODO dashboard assignment buttons and optionally the IR-linked TODO dashboard's RBA logic.

The suggested workflow goes as follows:

Use the TODO & DONE dashboard or the workflow in Incident Review in Enterprise Security to raise TODOs as issues occur.

Once a week, have a meeting with your Enterprise Security admins where you open the Status dashboard and click "set for TODOs" and for each correlation:
- discuss if there is anything new / blocking
- click one of the todo to open the TODO dashboard in a new tab, where TODOs can be added, cancelled, marked as done and/or the correlation assigned
- if there are TODOs linked to notables, the notables can be quickly found in IR by either: clicking the specific comment in the TODO dashboard or clicking a specific IR timestamp value in the status dashboard. Having an example of the issue greatly helps support the discussion
As time allows in the Enterprise Security admins day-to-day job:
- open Status dashboard and click "my TODOs"
- click on the title to open the Enterprise Security edit form OR click on the best practices compliance to open the Best Practices dashboard (see below)
- start working
- at any point, the Status dashboard can be refreshed (use the "quick correlation data refresh" link) and the peer review dashboards consulted to double check changes (or indeed have them peer reviewed)

Level 3 - Using the Best Practices framework

This potentially requires a significant investment in defining your own Best Practices and understanding how to tweak ES Choreographer to your needs, unless you're happy with the out-of-the-box Best Practices.

The suggested workflow goes as follows:

open Best Practices dashboard
find relevant correlation (scroll and/or use filters) and click anywhere in the table (except the "key" cell)
use the "edit rule" to open the Enterprise Security edit form and "open in search" to open the current SPL in a new search bar
do work and save it in Enterprise Security
tab back to the Best Practices dashboard and click the "refresh data" click next to the "?" at the top. this will update and refresh the whole dashboard.
check if happy with the impact, rinse and repeat.

Digest emails

You might want to enable the "Correlation Search comment activity" saved search and edit it so that it sends an email to your Enterprise Security admins. This will send emails with a summary of any TODO, DONE or review activity.

Same thing for the "Report on correlation searches Best Practices score evolution" search. This helps avoid unexpected degradation to Best Practices compliance.

KV stores Backup and restore

KV stores are used on top of index summary to keep the state of the comments, reviews and owners. A couple of scheduled searches are taking regular backups of these KV stores once a week day and every 1st, 8th, 15th, 22d and 29th of the month.

If something were to happen to your KV stores, you can restore them from any of the backups. The restore dashboard can be accessed from the status dashboard by clicking the "?". Follow the instructions on the dashboard.

Morning checks and morning checks checks

Another notion in ES Choreographer is the importance of not only having regular end-to-end harmless tests for your correlations (called morning checks), but also having a dashboard reporting on whether such tests were successful: the morning checks checks dashboard. As this is highly specific to your organisation, all ES Choreographer has to offer is a dashboard stub:

More information on the concept in this post or in conf21's SEC1441A: "How We Maintain Our Correlations in Splunk Enterprise Security at Thales UK" pdf/mp4