features¶
Modules for feature derivation of various data sets.
merge_streams¶
-
class
merge_streams.
Merge
(filepath='./', file_list=['short_t_toy_auth.txt', 'short_t_toy_proc.txt'], sort_column='time', date_format='int', delimiter=',')[source]¶ Live merging of csv files. The call of this object is a generator function which interleaves lines from a collection of files, ordered by a sort_column parameter.
- Assumes:
- Individual files are ordered by ascending sort column values.
- Individual files have headers with one column named the same as <sort_column> parameter.
- Files to merge are in the same folder specified by <file_path> parameter>.
- The generator operates as follows:
- Upon initialization, aligned lists of files, file names, file headers, and the first non-header line (split on delimiter with file-type index appended) of each file are constructed.
- When the Merge object is called the list of lines is sorted by time-stamp specified by <sort_column> and <date_format> parameters.
- The line (split on delimiter) with the earliest time stamp is returned along with the name of the file it came from (determined by appended event_type int).
- The line is replaced from the file it came from (determined by appended event_type int).
- If there are no more lines left in the file then it is closed and list entries associated with this file are removed from lists (determined by appended event_type int).
- Concludes generating when all files are ended.
Parameters: - filepath – Path to folder with files to merge.
- file_list – List of names of files to merge.
- sort_column – Column to sort lines of files on for sequential ordering of log lines.
- date_format – Can be any format string which makes sense to datetime.strptime or ‘int’ for simple integer time stamps.
- delimiter – Delimiter of csv columns, e.g. ‘,’, ‘ ‘ …
-
headers
¶ Returns: A list of headers (split by delimiter) from files being merged