Your first graph neural network: Detecting suspicious logins with link prediction#
Graphistry - Leo Meyerovich, Alex Morrise, Tanmoy Sarkar
Infosec Jupyterthon 2022, December 2022
Alert on & visualize anomalous identity events * Demo dataset: 1.6B windows events over 58 days => logins by 12K user over 14K systems * adapt to any identity system with logins * => Can we identify accounts & computers acting anomalously? Resources being oddly accessed? * => Can we spot the red team? * => Operations: Identity incident alerting + identity data investigations * Community/contact for help handling bigger-than-memory & additional features * Techniques explored: Graph AI - * RGCN (primary) - powerful with tweaking and in a pipeline * UMAP (secondary) - surprisingly effective with little tweaking * Runs on both CPU + multi-GPU * Tools: PyGraphistry[AI], DGL + PyTorch, and NVIDIA RAPIDS / umap-learn
1. Graphs are awesome#
Defenders think in lists, Attackers think in graphs. As long as this is true, attackers win.
Network graphs & event graphs & kill chains & ..: Honeypot
Today: Two techniques for the graph AI era, focusing on identity graphs
=> Caught 96% of red team’s logins (400+ out of millions) with only 10% FPs
Graph neural networks (GNNs) + UMAP
2. Graphs for identity data#
Sample attacks * Fake account * Account takeover: Malware, credential stuffing, … * Insider threat: Helpdesk, rogue admin, … * Abnormal resource access patterns
Data & user activities (UEBA): - Entity resolution: You, your assets, your contexts, .. - Authentication - Authorization - 💰💰💰 Did I mention zero-trust identity protection ? 💰💰💰💰
Goals: Empower - * Identity detection * Identity investigation
3. AI era of graph: GNNs + UMAP#
GNN’s: Science’s Breakthrough of 2021 - example
Combines network thinking (interesting connectivity) with tabular (time, $, etc. features)
Primitives:
Classify nodes (“bot”)
Predict links (“recommendation”, “violation”) <– TODAY
Classify graphs (“motif mining”)
Compose into tools:
Anomaly detection <– today
abuse scoring
feeding into combined methods: today we’re looking for graph shapes, but temporal cool too (RNN)
if model can do well at some task, good chance of reuse on other bits
4. RGCNs - Relational graph convolutional networks#
GNN - Graph neural network: Label prop
“if all their friends are bots, …”
multiple dimensions: bytes, region, …
GCNs - Graph convolutional network: Multiple layers
“even if know little about them, but their friends..”
shallow!
RGCNs - Relational GCNs:
multiple relationship types - follow vs block vs …
ex: remote desktop vs regular login
Watch 2 youtube videos at end for theoretical intuitions
5. Try it yourself#
See:
SSH logs RGCN anomaly detector in a few cells: simple-ssh-logs-rgcn-anomaly-detector.ipynb
In-depth RGCN: advanced-identity-protection-40m.ipynb
6. Taking it to production#
Watch the repo / contact to join us on:
Daily batch / real-time alerting => Splunk
Scaling & autonomous operation
Tuning: Time data, common FPs (new IPs, ..), …
Use for correlation ID generation for investigation context (see tmw’s UMAP talk!)
Next steps#
SSH logs RGCN anomaly detector in a few cells: simple-ssh-logs-rgcn-anomaly-detector.ipynb
In-depth RGCN: advanced-identity-protection-40m.ipynb
UMAP demo for 97% alert volume reduction & alert correlation
PyGraphistry (py, oss) + Graphistry Hub (free)
Dashboarding with graph-app-kit (containerized, gpu, graph Streamlit)
Happy to help:
email and let’s chat! info@graphistry.com
Resource#
GNN Videos:
Euler (combining RNN + GNN)- https://www.youtube.com/watch?v=1t124vguwJ8
[ ]: