.. safekit documentation master file, created by
   sphinx-quickstart on Thu Jan  5 17:42:22 2017.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. papers

.. _Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams: https://aaai.org/ocs/index.php/WS/AAAIW17/paper/viewFile/15126/14668
.. _Recurrent Neural Network Language Models for Open Vocabulary Event-Level Cyber Anomaly Detection: https://arxiv.org/abs/1712.00557
.. _Install tensorflow: https://www.tensorflow.org/versions/r0.7/get_started/os_setup.html


Authors
=======

- Aaron Tuor (aaron.tuor@pnnl.gov)
- Ryan Baerwolf (rdbaerwolf@gmail.com)
- Robin Cosbey (rcosbey@live.com)
- Nick Knowles (knowles.nick@gmail.com)
- Elliot Skomski (elliottskomski@gmail.com)
- Sam Kaplan (samuelpkaplan@gmail.com)
- Brian Hutchinson (brian.hutchinson@wwu.edu)
- Nicole Nichols (nicole.nichols@pnnl.gov)
- Sean Robinson (greatblondelf@gmail.com)
- Rob Jasper (robert.jasper@pnnl.gov)

About Safekit
=============
Safekit is a python software package for anomaly detection from multivariate streams,
developed for the **AIMSAFE** (Analysis in Motion Stream Adaptive Foraging for Evidence) project at Pacific Northwest National Laboratory.
An exposition of the models in this package can be found in the papers:

- `Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams`_
- `Recurrent Neural Network Language Models for Open Vocabulary Event-Level Cyber Anomaly Detection`_


The code of the toolkit is written in python using the tensorflow deep learning
toolkit and numpy.

Dependencies
============

Dependencies required for installation:

- Tensorflow 1.0 or above
- Numpy
- Scipy
- Sklearn
- Matplotlib

Installation
=============

A virtual environment is recommended for installation. Make sure that tensorflow 1.0+ is installed in your virtual environment.

`Install tensorflow`_

From the terminal in your activated virtual environment:

.. code-block:: bash

    (venv)$ git clone https:/github.com/hutchresearch/safekit.git
    (venv)$ cd safekit/
    (venv)$ python setup.py develop

To test your installation, from the top level directory run:

.. code-block:: bash

    $ tar -xjvf data_examples.tar.bz2
    $ python test/agg_tests.py data_examples/lanl/agg_feats data_examples/cert/agg_feats test.log
    $ python test/lanl_lm_tests.py data_examples/lanl/lm_feats/ test.log

These two tests should take about 10 to 15 minutes each depending on the processing capability of your system.
The tests range over many different model configurations and can be used as a somewhat comprehensive tutorial on the functionality of the code base.

Tutorials
=========

Jupyter Notebooks of these tutorials are located at safekit/examples/

- `LANL language model data preparation <../../LANL_LM_data.html>`_
- `Simple language model <../../simple_lm.html>`_
- `DNN aggregate model <../../dnn_agg.html>`_

Core Modules
============


.. toctree::
   :maxdepth: 2

   tf_ops.rst
   batch.rst
   graph_training_utils.rst
   util.rst

Models and Feature Derivation
=============================

.. toctree::
   :maxdepth: 2

   models.rst
   features.rst

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`