Splunk<> Graphistry#
Graphistry brings modern visual analytics to event data in Splunk. The full platform is intended for enterprise teams, while this tutorials shares visibility techniques for researchers and hunters.
To use: * Read along, start the prebuilt visualizations by clicking on them * Plug in your Graphistry API Key & Splunk credentials to use for yourself
Further reading: * UI Guide: https://hub.graphistry.com/docs/ui/index/ * Python client tutorials & demos: graphistry/pygraphistry * Graphistry API Key: https://www.graphistry.com/api-request * DoD / VAST challenges: https://www.cs.umd.edu/hcil/varepository/benchmarks.php
0. Configure#
[ ]:
#splunk
SPLUNK = {
'host': 'MY.SPLUNK.com',
'scheme': 'https',
'port': 8089,
'username': 'MY_SPLUNK_USER',
'password': 'MY_SPLUNK_PWD'
}
1. Imports#
[ ]:
import pandas as pd
Graphistry#
[ ]:
!pip install graphistry
import graphistry
graphistry.__version__
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
Requirement already satisfied: graphistry in /usr/local/lib/python2.7/dist-packages (0.9.56)
Requirement already satisfied: pandas>=0.17.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.22.0)
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from graphistry) (1.14.6)
Requirement already satisfied: requests in /usr/local/lib/python2.7/dist-packages (from graphistry) (2.18.4)
Requirement already satisfied: future>=0.15.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.16.0)
Requirement already satisfied: protobuf>=2.6.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (3.6.1)
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2018.5)
Requirement already satisfied: python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2.5.3)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2.6)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2018.8.24)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (3.0.4)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (1.11.0)
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (39.1.0)
u'0.9.56'
Splunk#
[ ]:
# !pip install splunk-sdk
import splunklib
[ ]:
#Connect to Splunk. Replace settings with your own setup.
import splunklib.client as client
import splunklib.results as results
service = client.connect(**SPLUNK)
[ ]:
def extend(o, override):
for k in override.keys():
o[k] = override[k]
return o
STEP = 10000;
def splunkToPandas(qry, overrides={}):
kwargs_blockingsearch = extend({
"count": 0,
"earliest_time": "2010-01-24T07:20:38.000-05:00",
"latest_time": "now",
"search_mode": "normal",
"exec_mode": "blocking"
}, overrides)
job = service.jobs.create(qry, **kwargs_blockingsearch)
print "Search results:\n"
resultCount = job["resultCount"]
offset = 0;
print 'results', resultCount
out = None
while (offset < int(resultCount)):
print "fetching:", offset, '-', offset + STEP
kwargs_paginate = extend(kwargs_blockingsearch,
{"count": STEP,
"offset": offset})
# Get the search results and display them
blocksearch_results = job.results(**kwargs_paginate)
reader = results.ResultsReader(blocksearch_results)
lst = [x for x in reader]
df2 = pd.DataFrame(lst)
out = df2 if type(out) == type(None) else pd.concat([out, df2], ignore_index=True)
offset += STEP
return out
2. Get data#
[ ]:
query = 'search index="vast" srcip=* destip=* | rename destip -> dest_ip, srcip -> src_ip | fields dest_ip _time src_ip protocol | eval time=_time | fields - _* '
%time df = splunkToPandas(query, {"sample_ratio": 1000})
#df = splunkToPandasAll('search index="vast" | head 10')
#df = pd.concat([ splunkToPandas('search index="vast" | head 10'), splunkToPandas('search index="vast" | head 10') ], ignore_index=True)
print 'results', len(df)
df.sample(5)
Search results:
results 5035
fetching: 0 - 10000
CPU times: user 4.95 s, sys: 13.3 ms, total: 4.96 s
Wall time: 7.92 s
results 5035
dest_ip | src_ip | protocol | time | |
---|---|---|---|---|
4324 | 10.138.235.111 | 172.30.0.4 | TCP | 1505519752 |
2806 | 10.0.3.5 | 10.12.15.152 | TCP | 1505519767 |
2630 | 10.0.4.5 | 10.12.15.152 | TCP | 1505519769 |
20 | 10.0.4.7 | 10.6.6.7 | TCP | 1505519795 |
866 | 10.0.2.8 | 10.17.15.10 | TCP | 1505519787 |
3. Visualize!#
A) Simple IP<>IP: 1326 nodes, 253K edges#
[ ]:
graphistry.bind(source='src_ip', destination='dest_ip').edges(df).plot()
B) IP<>IP + srcip<>protocol: 1328 nodes, 506K edges#
[ ]:
def make_edges(df, src, dst):
out = df.copy()
out['src'] = df[src]
out['dst'] = df[dst]
return out
ip2ip = make_edges(df, 'src_ip', 'dest_ip')
srcip2protocol = make_edges(df, 'src_ip', 'protocol')
combined = pd.concat([ip2ip, srcip2protocol], ignore_index=True)
combined.sample(6)
dest_ip | src_ip | protocol | time | src | dst | |
---|---|---|---|---|---|---|
6889 | 10.0.3.5 | 10.13.77.49 | TCP | 1505519777 | 10.13.77.49 | TCP |
3440 | 10.0.2.6 | 10.12.15.152 | TCP | 1505519761 | 10.12.15.152 | 10.0.2.6 |
6396 | 10.0.4.5 | 10.138.235.111 | TCP | 1505519782 | 10.138.235.111 | TCP |
1394 | 10.0.4.5 | 10.138.235.111 | TCP | 1505519782 | 10.138.235.111 | 10.0.4.5 |
5975 | 10.0.2.7 | 10.17.15.10 | TCP | 1505519786 | 10.17.15.10 | TCP |
8683 | 10.0.2.4 | 10.12.15.152 | TCP | 1505519759 | 10.12.15.152 | TCP |
[ ]:
graphistry.bind(source='src', destination='dst').edges(combined).plot()
3. All<>All via Hypergraph: 254K nodes, 760K edges#
[ ]:
hg = graphistry.hypergraph(df, entity_types=[ 'src_ip', 'dest_ip', 'protocol'] )
print hg.keys()
hg['graph'].plot()
('# links', 15105)
('# event entities', 5035)
('# attrib entities', 170)
['entities', 'nodes', 'edges', 'events', 'graph']
[ ]:
Node Colors#
[ ]:
nodes = pd.concat([
df[['src_ip']].rename(columns={'src_ip': 'id'}).assign(orig_col='src_ip'),
df[['dest_ip']].rename(columns={'dest_ip': 'id'}).assign(orig_col='dest_ip') ],
ignore_index=True).drop_duplicates(['id'])
#see https://hub.graphistry.com/docs/api/api-color-palettes/
col2color = {
"src_ip": 90005,
"dest_ip": 46005
}
nodes_with_color = nodes.assign(color=nodes.apply(lambda row: col2color[ row['orig_col'] ], axis=1))
nodes_with_color.sample(3)
id | orig_col | color | |
---|---|---|---|
4383 | 172.30.0.3 | src_ip | 90005 |
9403 | 10.0.0.42 | dest_ip | 46005 |
4206 | 172.30.0.4 | src_ip | 90005 |
[ ]:
graphistry.bind(source='src_ip', destination='dest_ip').edges(df).nodes(nodes_with_color).bind(node='id', point_color='color').plot()
[ ]: