top of page
Writer's pictureGabriel Vasseur

RBA: Aggregate user & system risks!

Since RBA is all about aggregating security events that are related to the same entity, Assets & Identities normalisation is crucial to its success. Beyond basic normalisation for user and system risk objects, you also want to aggregate risks associated with a user with the risks associated with the assets they own (e.g. their laptop). We present a way to do that in Splunk.

 

The Idea

 

The whole implementation is a lot to take in, with many subtle details, so let's just first explain the what and the why.

 

Identity Normalisation

 

Say you have a correlation search set up to raise a "user" risk:



That's great, but the user field could contain a variety of values:

 

  • a user name, e.g. GVasseur

  • a user name with a domain name, e.g. MyDomain\GVASSEUR

  • an admin account, e.g. GVasseurADM

  • an email address: gvasseur@company.com

 

These are all different though they relate to the same entity: gvasseur. For RBA to do its aggregation magic, they need to be normalised (including casing). And at the same time we can enrich the data with information about the user:

 

  • normalisation is modifying the content of the user field, but we don't want to lose information, hence the "user_original" field.

  • for email addresses, the field is likely going to be called something like "recipient" or "sender" instead of "user" but it would be weird for it to be normalised to a non-email address value. See implementations details below for what it means in practice.

 

This normalisation needs to happen consistently in every correlation search that raises a user risk. They can do so by using a macro that wraps around Enterprise Security's `get_identity4events`.

 

To support the macro it's a good idea to ingest AD accounts data in the identities framework and merge it with HR data. This is what allows to link admin or test accounts with the right person.

 

Assets Normalisation

 

It's a similar story with system risk objects, but it gets more complicated.

 

So, you have a correlation search set up to raise a "system" risk:



That's great, but the dest field could contain a variety of values:


  • a hostname, e.g. GabsLaptop

  • a hostname with a domain name, e.g. GabsLaptop.domain.name

  • an unknown IP, e.g. 10.100.200.300 but the dest_nt_host field is also populated and happens to have a hostname, e.g. GabsLaptop

  • a DHCP IP, e.g. 10.10.1.2

  • a VPN IP, e.g. 10.20.1.2

  • (if the asset was a server with a fixed IP, dest could also be a known CMDB IP but this is not applicable in our laptop example)


These are all different though they relate to the same entity: GabsLaptop and, ultimately, the user gvasseur who owns it. Again, we need to normalise and enrich.

 

  • As with identities, normalisation is modifying the results, but we don't want to lose information, hence the "dest_original" field.

  • The DHCP IP is converted to a hostname thanks to a timed lookup: it will return the hostname associated with this IP at 7:27 this morning, even if the correlation triggers much later and the IP has since moved on.

  • Similar thing for the VPN IP, but the VPN logs only associate an IP with the username that logged in, not a hostname. So in this case dest hasn't changed and we don't know much about the system, but notice the potentially very useful dest_vpn_user fields

 

This normalisation needs to happen consistently in every correlation search that raises a system risk. They can do so by using a macro that wraps around ES's `get_asset`.

 

To support the macro you'll need scheduled searches to update timed lookups for DHCP and VPN data every minute or two.

 

Combining users with their machines' risks

 

If possible, it's good practice to get your correlations to raise risks against both a user and a system. Any potential duplicates will be taken care of with my deduplication technique (see my earlier RBA post) so there's no harm done. In some cases, the data the correlation search is mining will contain only either a username or a system name. Or it might contain both, but user might be something generic like "NT Authority\System" or "root". This is where combining risks raised against a user with the risks raised against their system(s) becomes useful.


To achieve this, we'll see if we can map system risks back to a user. If a system risk is raised, we want it to effectively become a user risk if we can identify who is behind it (e.g. the asset owner OR the account logged). Because we still want each correlation to raise multiple risks (typically one for user and one for dest, but there could be more), we definitely can't implement this logic in the SPL of the correlations.


Thankfully we can influence the next step of the process, which is when the risk events get accelerated into the Risk data model.


We can have automated lookups that look up the system risk object to find out either its owner or the VPN user logged in behind it. With calculated fields in the Risk data model we can select whichever value is relevant (if existing) and assign the new risk type appropriately.


We then need to make sure any risk alert or risk overview dashboard is using these new Risk data model calculated fields rather than risk_object and risk_object_type.


Conclusion


Before we look into the implementation details, let's summarise the idea.


With this approach you can consistently raise risks to the same entity (say gvasseur) whether the original event involved:

  • their main identifier (gvasseur) or any variants of it (e.g. GVasseur, MyDomain\GVASSEUR, etc...)

  • their other accounts (e.g. test or admin accounts, e.g. GVasseurADM)

  • their laptop hostname (GabsLaptop) or any variant (e.g. GabsLaptop.domain.name, fixed IP, etc...)

  • a DHCP IP assigned to their laptop

  • a VPN IP assigned to their username

 

This will enable your RBA implementation to aggregate events further than ever. Make sure you use my deduplication technique as now you have the potential to get many duplicates!

 

The approach is not hiding information. For each risk contribution, it's still possible to see what the original system risk object was.

 

A number of aspects of this approach requires a lot of consistency in how the correlation searches are developed and maintained. See my ES Choreographer app and its Best Practices dashboard to help with this.


Implementation Details

 

Disclaimer: only use for inspiration. Do not use any part of this unless you fully understand it and know it makes sense in your environment.


This article assumes you're comfortable with Splunk and Enterprise Security's administration and concepts. That said, please reach out if something isn't as clear as it could be.

 

Identity ingest

 

At the minimum you should ingest on a regular basis (twice a day?) your employee data from HR into the ES identities framework. How to do this is beyond the scope of this article.

 

On top of this, you should ingest your accounts information. You can do this by querying AD, for instance with the ldap search app (I believe it is https://splunkbase.splunk.com/app/1151). Again, this is beyond the scope of this article.

 

You then have to merge both of these feeds together: the accounts information and the HR data. Beware! There be dragons:

 

If you have a neat company, every account will consistently be created with something like a unique employee ID and this will match a field in the HR data. ES's entitymerge will automagically merge the data for you, but it might be over zealous. For instance if following an acquisition 2 distinct persons have the same email address local part (e.g. John Smith has jsmith@company1.com and Jonathan Smith has jsmith@company2.com). In practice you'll have to scrutinise what ends up in identity_lookup_expanded and tweak entitymerge's behaviour or what you feed it. Again, way beyond the scope of this article.

 

You might want to add an "id" field to your identities lookup. This is because the identity field contains a mix of identifiers for the employee such as employee number, their various accounts (e.g. gvasseur, gvasseurADM, gvasseur_test...), their email, etc, in any order. What we need is a "canonical" id for the employee, which probably should be their main username (e.g. gvasseur).

 

The id field will need to be added in Enterprise Security > Configure > Data Enrichment > Asset and Identity Management > Identity Fields.


Identity lookup macro


Here is the SPL for the macro. It takes a single argument called "username".



Identity Normalisation


Notes about the identity macro above:


ES's macro annoyingly only works if used with one of user, src_user, host_owner, orig_host_owner, src_owner, dest_owner, or dvc_owner. To make it work with $username$ being any arbitrary field, we're moving the content of the $username$ field to "user" before calling ES's macro on "user" and then moving the results back. Before we do that we need to move the content of "user" and all the "user_*" fields to some temporary backup and then move it back at the very end so we're not losing any information.


If the correlation has any logic that needs the original field (e.g. checking if it's an admin account), this needs to happen BEFORE normalisation OR use the user_original field instead. If you have a risk factor that does that automatically for you after the fact, it won't work any more with this approach.


For email risk objects, you need a bit more SPL in your search. That's because we chose not to rewrite the original email with the user's username, as recipient="gvasseur" would look wrong. The SPL you need is something like: | eval risk_object=if( isnotnull(user_id), user_id, user ) Or you could systematically raise a risk against both user and user_id (to cover the cases where the user lookup failed).


Multivalue fields are ignored. This is because they are really messy to lookup.


DHCP IPs


Here's a search you can schedule every minute or two over the last 24 hours:



Where COMPANY_dhcp is a KV Store with a time field called _time and 2 string fields called host and ip.

 

COMPANY_dhcp is also the name of a lookup definition linked to the COMPANY_dhcp KV Store and where the "configure time-based lookup" checkbox is checked (leave everything else as default).

 

Because this search is scheduled every minute or two, it needs to be as efficient as possible. If it wasn't for the need for the snapshot lookup (which is only needed if you want to merge user and system risks) it could be run over the last hour or two only. Even on 24 hours though, it's still faster to append to the KV store rather than recreate it from scratch. Unfortunately this means that COMPANY_dhcp will grow indefinitely. Therefore you'll need another search that runs once a day at night and does:



(In case you're wondering, the last line is to prevent the search head's job holding on to results that are not worth keeping)

 

VPN IPs

 

Similar search for VPN IPs:



Where COMPANY_vpn is a KV Store with a time field called _time and 3 string fields called internal_ip, src and user.

 

COMPANY_vpn is also the name of a time-based lookup definition linked to the COMPANY_vpn KV Store.

 

The same considerations apply as for the DHCP search. So you'll need another search that runs once a day at night. You can combine it with the DHCP one and kill 2 birds with one stone:



ES Assets Configuration

 

When ingesting your assets information into ES, make sure the "owner" field gets populated with the canonical way to refer to the user (e.g. "gvasseur" instead of "Gabriel Vasseur" or an employee number).


Go to Enterprise Security > Configure > Data Enrichment > Asset and Identity Management > Asset Lookups. Add both COMPANY_dhcp_snapshot.csv and COMPANY_vpn_snapshot.csv.


You also need to add a few fields in the next tab (Asset Fields): 

  • is_dynamic_ip

  • vpn_src

  • vpn_user


This is required to combine user and machine risks. The asset lookup macro below expects the assets database to include these details.


Assets lookup macro


Here is the SPL for the macro.



Assets normalisation considerations

 

DHCP and VPN are relying on timed lookups, and this affects performance. You will notice it if you have a search that returns thousands of results and you're calling the macro on them. Better to filter and group as much as possible before you lookup the assets.


To keep the performance impact within reason we limited the DHCP and VPN history to the last 24 hours. You could limit it further and it will help.


On the other end you might want to have the convenience of being able to lookup temporary IPs much further back in the past than the last 24 hours (e.g. for a monthly report or a long-term investigation). To enable this, you can have another set of KV stores and keep them up to date with the same saved search. For instance where the VPN search does:



You could do:



You then can create another version of the lookup macro that uses this _far version instead of the normal one. You also need to alter the maintenance search to keep the _far versions under control, but you can set them to keep the last month or even more.

  

Combining user and system risks

 

We're getting there! Before we go further we need to learn a bit more about how ES works. No action in this section, just knowledge.

 

In the props.conf of the SA-ThreatIntelligence app, there are a couple of calculated fields that kick in if "source" ends with "- Rule". They will trigger on risk events generated by correlations. All they do is:



 Then the SA-IdentityManagement app has three automatic lookups:

* LOOKUP-zy-identity_lookup_expanded-_risk_user

* LOOKUP-zu-asset_lookup_by_str-_risk_system

* LOOKUP-zv-asset_lookup_by_cidr-_risk_system

 

These are looking up the riskuser and risksystem fields in the Assets & Identities lookups. Because we've added fields such as vpn_user to the ES assets framework, they will output fields such as risk_object_vpn_user.

 

Because this is automatic, it happens every time the risk index is searched. Try it:



 This means these new fields will be availabe when the Risk data model gets accelerated.


The Risk data model

 

This is where we need to do our next tweak.

 

The Risk data model defines a calculated field called normalized_risk_object and defined as:



 I guess its purpose is to help out in case you haven't been diligently lookup assets and identities in your correlation searches. It takes the asset or identity fields provided by the automatic lookups and select the first value they contain. The identity field for instance will contain a mixture of employee number, main account, other accounts, email address, in any order. That might not yield very consitent clear results (e.g. a username for some users, an employee ID for others, etc). This is why we added an id field to our identity lookup. The story is similar for assets, where we just want to use nt_host consistently.

 

The natural next step would be to customise the normalized_risk_object but frustratingly ES has some enforcement that will cancel your changes. So the solution is to create entirely new fields:



This is where the magic happens, and what we've been working towards all this time!


There is one major caveat: for cases where the risk_object is a VPN IP this will be time sensitive. If the correlation runs long enough after the events that the VPN IP has moved on, then the vpn user will be wrong. In practice we haven't this to be enough a problem to do something about it yet. A workaround would be to add dest_vpn_user as a user risk to the correlation search configuration.

 

The final step is to tweak our risk alerts.

 

RBA Risk Alerts

 

Whatever risk notable rules you have already will mostly work the same but they need to be tweaked to use the new data model fields.

 

One question is: do we just want to use the new fields and trust them fully or do we want to use BOTH the traditional fields and the new ones? I believe using both is better. That way you don't lose anything that you used to have and you benefit from the new fields. The new fields should be better in most cases but there can always be issues with assets and identities being out of date or corrected halfway through an attack in which case the new fields might actually dilute the risks rather than group them better.

 

Of course the problem with keeping both is that we'll need some clever extra deduplication, because you might end up with two alerts in Incident Review: one against the system, and one against the user that is a superset of the one against the system.

 

Here is what we would suggest as a start to your risk alerts:



Following this should be the SPL from my deduplication technique, and then whatever logic you want to apply (e.g. total risk score above a threshold).

 

If you genuinely read this article in full, kudos for making it to the end! This is quite dense in information and details. It'll take time to digest. Please reach out if you have any suggestions for improvements.


Addentum: an alternative to using snapshots for dynamic IPs


We've built this system over time and as I'm writing this up I realise there might be a better approach here.


The idea of this alternative is to make the most of the work done at the time the correlation runs and raise multiple risks systematically. So where you might used to have a rule raising a system risk against dest for instance:


You would now want to raise 3 risks related to dest (but only one being a system risk):


The downside is you would have to be very consistent with how all the correlations raise risks. Also you might end up with risk alert notables alerting about different facets of the same entity that are subsets of each other, a bit like what would happen if we didn't add the extra deduplication logic mentioned above. This time though it might be harder to deal with.


The upside is you can simplify some of the approach described in this article as there is no longer a need to maintain snapshot CSV lookups for DHCP and VPN:


  • The lookup generator searches can now be simplified by removing anything after the "outputlookup append=true" commands. The rest is still needed as the KV store collections are still required and should still be updated, but you can now reduce the time window to something like the last 2 hours and it will have less of a performance impact.

  • The snapshot CSV lookups do not need to be added to ES's assets inputs.

  • The get_asset macro wrapper could be simplified a little when it comes to managing dynamic IPs.


I believe everything else should remain the same, including the changes to the Risk data model and the risk alert rules.


As an added bonus this approach does not have the potential accuracy issue for VPN IPs if the risk is created much later than the event that triggered it.


If you choose this approach, let me know how you get on!

44 views0 comments

Recent Posts

See All

©2021 by Gabriel Vasseur. Proudly created with Wix.com

bottom of page