features

Modules for feature derivation of various data sets.

merge_streams

class merge_streams.Merge(filepath='./', file_list=['short_t_toy_auth.txt', 'short_t_toy_proc.txt'], sort_column='time', date_format='int', delimiter=',')[source]

Live merging of csv files. The call of this object is a generator function which interleaves lines from a collection of files, ordered by a sort_column parameter.

Assumes:
  1. Individual files are ordered by ascending sort column values.
  2. Individual files have headers with one column named the same as <sort_column> parameter.
  3. Files to merge are in the same folder specified by <file_path> parameter>.
The generator operates as follows:
  1. Upon initialization, aligned lists of files, file names, file headers, and the first non-header line (split on delimiter with file-type index appended) of each file are constructed.
  2. When the Merge object is called the list of lines is sorted by time-stamp specified by <sort_column> and <date_format> parameters.
  3. The line (split on delimiter) with the earliest time stamp is returned along with the name of the file it came from (determined by appended event_type int).
  4. The line is replaced from the file it came from (determined by appended event_type int).
  5. If there are no more lines left in the file then it is closed and list entries associated with this file are removed from lists (determined by appended event_type int).
  6. Concludes generating when all files are ended.
Parameters:
  • filepath – Path to folder with files to merge.
  • file_list – List of names of files to merge.
  • sort_column – Column to sort lines of files on for sequential ordering of log lines.
  • date_format – Can be any format string which makes sense to datetime.strptime or ‘int’ for simple integer time stamps.
  • delimiter – Delimiter of csv columns, e.g. ‘,’, ‘ ‘ …
headers
Returns:A list of headers (split by delimiter) from files being merged
next_event(event_type)[source]
Parameters:event_type – Integer associated with a file to read from.
Returns:Next event (line from file split on delimiter with type appended) from file associated with event_type.