Your first graph neural network: Detecting suspicious logins with link prediction#

Graphistry - Leo Meyerovich, Alex Morrise, Tanmoy Sarkar

Infosec Jupyterthon 2022, December 2022

Alert on & visualize anomalous identity events * Demo dataset: 1.6B windows events over 58 days => logins by 12K user over 14K systems * adapt to any identity system with logins * => Can we identify accounts & computers acting anomalously? Resources being oddly accessed? * => Can we spot the red team? * => Operations: Identity incident alerting + identity data investigations * Community/contact for help handling bigger-than-memory & additional features * Techniques explored: Graph AI - * RGCN (primary) - powerful with tweaking and in a pipeline * UMAP (secondary) - surprisingly effective with little tweaking * Runs on both CPU + multi-GPU * Tools: PyGraphistry[AI], DGL + PyTorch, and NVIDIA RAPIDS / umap-learn

1. Graphs are awesome#

Defenders think in lists, Attackers think in graphs. As long as this is true, attackers win.
Network graphs & event graphs & kill chains & ..: Honeypot
Today: Two techniques for the graph AI era, focusing on identity graphs
=> Caught 96% of red team’s logins (400+ out of millions) with only 10% FPs
Graph neural networks (GNNs) + UMAP

2. Graphs for identity data#

Sample attacks * Fake account * Account takeover: Malware, credential stuffing, … * Insider threat: Helpdesk, rogue admin, … * Abnormal resource access patterns

Data & user activities (UEBA): - Entity resolution: You, your assets, your contexts, .. - Authentication - Authorization - 💰💰💰 Did I mention zero-trust identity protection ? 💰💰💰💰

Goals: Empower - * Identity detection * Identity investigation

3. AI era of graph: GNNs + UMAP#

GNN’s: Science’s Breakthrough of 2021 - example
Combines network thinking (interesting connectivity) with tabular (time, $, etc. features)
Primitives:
- Classify nodes (“bot”)
- Predict links (“recommendation”, “violation”) <– TODAY
- Classify graphs (“motif mining”)
Compose into tools:
- Anomaly detection <– today
- abuse scoring
- feeding into combined methods: today we’re looking for graph shapes, but temporal cool too (RNN)
- if model can do well at some task, good chance of reuse on other bits

4. RGCNs - Relational graph convolutional networks#

Twitter botnet example

GNN - Graph neural network: Label prop
- “if all their friends are bots, …”
- multiple dimensions: bytes, region, …
GCNs - Graph convolutional network: Multiple layers
- “even if know little about them, but their friends..”
- shallow!
RGCNs - Relational GCNs:
- multiple relationship types - follow vs block vs …
- ex: remote desktop vs regular login

Watch 2 youtube videos at end for theoretical intuitions

5. Try it yourself#

See:

SSH logs RGCN anomaly detector in a few cells: simple-ssh-logs-rgcn-anomaly-detector.ipynb
In-depth RGCN: advanced-identity-protection-40m.ipynb

6. Taking it to production#

Watch the repo / contact to join us on:

Daily batch / real-time alerting => Splunk
Scaling & autonomous operation
Tuning: Time data, common FPs (new IPs, ..), …
Use for correlation ID generation for investigation context (see tmw’s UMAP talk!)

Next steps#

SSH logs RGCN anomaly detector in a few cells: simple-ssh-logs-rgcn-anomaly-detector.ipynb
In-depth RGCN: advanced-identity-protection-40m.ipynb
UMAP demo for 97% alert volume reduction & alert correlation
PyGraphistry (py, oss) + Graphistry Hub (free)
- Dashboarding with graph-app-kit (containerized, gpu, graph Streamlit)
Happy to help:
- Join our Slack
- email and let’s chat! info@graphistry.com

Resource#

PyGraphistry[AI]
What is graph intelligence
GNN Videos:
- GCN - https://www.youtube.com/watch?v=2KRAOZIULzw
- RGCN - https://www.youtube.com/watch?v=wJQQFUcHO5U
- Euler (combining RNN + GNN)- https://www.youtube.com/watch?v=1t124vguwJ8

[ ]:

Your first graph neural network: Detecting suspicious logins with link prediction

Contents