Splunk<> Graphistry

Contents

Splunk<> Graphistry#

Graphistry brings modern visual analytics to event data in Splunk. The full platform is intended for enterprise teams, while this tutorials shares visibility techniques for researchers and hunters.

To use: * Read along, start the prebuilt visualizations by clicking on them * Plug in your Graphistry API Key & Splunk credentials to use for yourself

Further reading: * UI Guide: https://hub.graphistry.com/docs/ui/index/ * Python client tutorials & demos: graphistry/pygraphistry * Graphistry API Key: https://www.graphistry.com/api-request * DoD / VAST challenges: https://www.cs.umd.edu/hcil/varepository/benchmarks.php

0. Configure#

[ ]:

#splunk
SPLUNK = {
    'host': 'MY.SPLUNK.com',
    'scheme': 'https',
    'port': 8089,
    'username': 'MY_SPLUNK_USER',
    'password': 'MY_SPLUNK_PWD'
}

1. Imports#

[ ]:

import pandas as pd

Graphistry#

[ ]:

!pip install graphistry

import graphistry
graphistry.__version__

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options: https://pygraphistry.readthedocs.io/en/latest/server/register.html

Requirement already satisfied: graphistry in /usr/local/lib/python2.7/dist-packages (0.9.56)
Requirement already satisfied: pandas>=0.17.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.22.0)
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from graphistry) (1.14.6)
Requirement already satisfied: requests in /usr/local/lib/python2.7/dist-packages (from graphistry) (2.18.4)
Requirement already satisfied: future>=0.15.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.16.0)
Requirement already satisfied: protobuf>=2.6.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (3.6.1)
Requirement already satisfied: python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2.5.3)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2.6)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2018.8.24)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (3.0.4)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (1.11.0)
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (39.1.0)

u'0.9.56'

Splunk#

[ ]:

# !pip install splunk-sdk

import splunklib

[ ]:

#Connect to Splunk. Replace settings with your own setup.
import splunklib.client as client
import splunklib.results as results

service = client.connect(**SPLUNK)

[ ]:

def extend(o, override):
  for k in override.keys():
    o[k] = override[k]
  return o

STEP = 10000;
def splunkToPandas(qry, overrides={}):
    kwargs_blockingsearch = extend({
        "count": 0,
        "earliest_time": "2010-01-24T07:20:38.000-05:00",
        "latest_time": "now",
        "search_mode": "normal",
        "exec_mode": "blocking"
    }, overrides)
    job = service.jobs.create(qry, **kwargs_blockingsearch)

    print "Search results:\n"
    resultCount = job["resultCount"]
    offset = 0;

    print 'results', resultCount
    out = None
    while (offset < int(resultCount)):
        print "fetching:", offset, '-', offset + STEP
        kwargs_paginate = extend(kwargs_blockingsearch,
                                 {"count": STEP,
                                  "offset": offset})

        # Get the search results and display them
        blocksearch_results = job.results(**kwargs_paginate)
        reader = results.ResultsReader(blocksearch_results)
        lst = [x for x in reader]
        df2 = pd.DataFrame(lst)
        out = df2 if type(out) == type(None) else pd.concat([out, df2], ignore_index=True)
        offset += STEP
    return out

2. Get data#

[ ]:

query = 'search index="vast" srcip=* destip=* | rename destip -> dest_ip, srcip -> src_ip | fields dest_ip _time src_ip protocol | eval time=_time | fields - _* '
%time df = splunkToPandas(query, {"sample_ratio": 1000})

#df = splunkToPandasAll('search index="vast" | head 10')
#df = pd.concat([ splunkToPandas('search index="vast" | head 10'), splunkToPandas('search index="vast" | head 10') ], ignore_index=True)


print 'results', len(df)

df.sample(5)

Search results:

results 5035
fetching: 0 - 10000
CPU times: user 4.95 s, sys: 13.3 ms, total: 4.96 s
Wall time: 7.92 s
results 5035

	dest_ip	src_ip	protocol	time
4324	10.138.235.111	172.30.0.4	TCP	1505519752
2806	10.0.3.5	10.12.15.152	TCP	1505519767
2630	10.0.4.5	10.12.15.152	TCP	1505519769
20	10.0.4.7	10.6.6.7	TCP	1505519795
866	10.0.2.8	10.17.15.10	TCP	1505519787

3. Visualize!#

A) Simple IP<>IP: 1326 nodes, 253K edges#

[ ]:

graphistry.bind(source='src_ip', destination='dest_ip').edges(df).plot()

B) IP<>IP + srcip<>protocol: 1328 nodes, 506K edges#

[ ]:

def make_edges(df, src, dst):
  out = df.copy()
  out['src'] = df[src]
  out['dst'] = df[dst]
  return out



ip2ip = make_edges(df, 'src_ip', 'dest_ip')
srcip2protocol = make_edges(df, 'src_ip', 'protocol')

combined = pd.concat([ip2ip, srcip2protocol], ignore_index=True)
combined.sample(6)

	dest_ip	src_ip	protocol	time	src	dst
6889	10.0.3.5	10.13.77.49	TCP	1505519777	10.13.77.49	TCP
3440	10.0.2.6	10.12.15.152	TCP	1505519761	10.12.15.152	10.0.2.6
6396	10.0.4.5	10.138.235.111	TCP	1505519782	10.138.235.111	TCP
1394	10.0.4.5	10.138.235.111	TCP	1505519782	10.138.235.111	10.0.4.5
5975	10.0.2.7	10.17.15.10	TCP	1505519786	10.17.15.10	TCP
8683	10.0.2.4	10.12.15.152	TCP	1505519759	10.12.15.152	TCP

[ ]:

graphistry.bind(source='src', destination='dst').edges(combined).plot()

3. All<>All via Hypergraph: 254K nodes, 760K edges#

[ ]:

hg = graphistry.hypergraph(df, entity_types=[ 'src_ip', 'dest_ip', 'protocol'] )
print hg.keys()
hg['graph'].plot()

('# links', 15105)
('# event entities', 5035)
('# attrib entities', 170)
['entities', 'nodes', 'edges', 'events', 'graph']

[ ]:

Node Colors#

[ ]:

nodes = pd.concat([
    df[['src_ip']].rename(columns={'src_ip': 'id'}).assign(orig_col='src_ip'),
    df[['dest_ip']].rename(columns={'dest_ip': 'id'}).assign(orig_col='dest_ip') ],
    ignore_index=True).drop_duplicates(['id'])

#see https://hub.graphistry.com/docs/api/api-color-palettes/
col2color = {
    "src_ip": 90005,
    "dest_ip": 46005
}

nodes_with_color = nodes.assign(color=nodes.apply(lambda row: col2color[ row['orig_col'] ], axis=1))

nodes_with_color.sample(3)

	id	orig_col	color
4383	172.30.0.3	src_ip	90005
9403	10.0.0.42	dest_ip	46005
4206	172.30.0.4	src_ip	90005

[ ]:

graphistry.bind(source='src_ip', destination='dest_ip').edges(df).nodes(nodes_with_color).bind(node='id', point_color='color').plot()

[ ]: