This guide is meant to provide a quick start for using qFeature package. We begin with installation instructions in ‘Getting Started’. Following installation is a basic diagram of the ‘Package Structure’ to illustrate the hierarchy of functions. Finally, we conclude with details regarding the four ‘Core Functions’ contained within the package complete with follow-along examples.

## Description

The qFeature package is designed to extract features from continuous and discrete variables contained within a time series dataset. These features can then be used as inputs to multivariate statistical procedures like clustering, dimensionality reduction, and classification. This is a high-speed implementation of the feature extraction methods of the Morning Report Algorithms developed by Brett Amidan and Tom Ferryman (Amidan and Ferryman 2005).

## Getting Started

The first thing you will need to do is install the qFeature package. Installation instructions are provided in the README.md file of the package repository. Once the package has been installed, the library can be loaded so that the functions can be used.

library(qFeature)

A complete list of the functions contained within the qFeature package can be seen using:

help(package = qFeature)

## Package Structure

In what follows, we explain the four core functions of qFeature: ddply_getFeatures(), getFeatures(), discFeatures(), and fitQ.

### discFeatures()

#### Description

This function is intended for use on a time series variable with discrete states and calculates the percentage of time spent at each of those states as well as counting the transitions in the variable from one state to another.

#### How to use discFeatures()

Using discFeatures() is most easily demonstrated with a simple example. Let’s begin by creating a small dataset with 2 discrete states (TRUE/FALSE).

discData <- c("TRUE", "FALSE", "FALSE", NA, "TRUE", "TRUE", NA, NA, "TRUE", "FALSE", "TRUE", "FALSE", "TRUE")
discData
##  [1] "TRUE"  "FALSE" "FALSE" NA      "TRUE"  "TRUE"  NA      NA
##  [9] "TRUE"  "FALSE" "TRUE"  "FALSE" "TRUE"

Now we have a small data set of length 13 that contains two discrete phases (also called “grouping variables”) and 3 missing values stored as NA. Now if we apply the discFeatures() function to our dataset we can see what happens.

discFeatures(discData)
##        percent.FALSE         percent.TRUE num_trans.FALSE_TRUE
##                  0.4                  0.6                  3.0
## num_trans.TRUE_FALSE
##                  3.0

As you can see the percentage calculations are made without consideration of the missing values and reflect that 40% of the data are FALSE and 60% of the data are TRUE. Additionally we have some information about transitions or in other words we can see that the value changed from FALSE to TRUE 3 times and from TRUE to FALSE 3 times. Go ahead and count the transitions yourself just to make sure you agree with the output. The transitions have significance because this function is intended to be used on time series data and so a transition from one state to another could be meaningful.

### fitQ()

#### Description

Fits a moving quadratic (or simple linear) regression model over a series using a 2-sided window. It dynamically accounts for the incomplete windows which are caused by missing values and which occur at the beginning and end of the series. This function is used to extract data from continuous variables.

#### How to use fitQ()

For your vector of response measures, a window is fit to the data that is initially centered at whatever point is indexed by start. From here the x1 is used to define the width of the window which is extended equally in both directions about a center point (which is why the length must be odd). min.window defines how many points are required in order to fit a model and subsequently produce a value in our signature. Once our window characteristics are defined we can choose one of two regression models:

Linear: $$y=b_0+b_1 x_1+\epsilon$$

or

Quadratic: $$y=b_0+b_1 x_1+b_2 x_2^2+\epsilon$$

After our initial linear or quadratic model is fit, several features are extracted from the regression and denoted as a, b, c, and d which are defined below.

a: The estimated intercepts

b: The estimated linear coefficients

c: The estimated quadratic coefficients. These are NA if linear.only = TRUE

d: The root mean squared error (RMSE)

The data moves through the window by an increment of skip which has a default of 1 and fits the regression model using the data contained in each new window. This iterates over the entire vector of the response measure in order to produce a signature of features with a corresponding a, b, c, and d for each regression that was fit.

##### Illustration

The illustration below provides a simple demonstration of how fitQ() fits a series of regression models to finite windows of data. The gray points denote data that falls outside the current window that is being used in fitting the regression model. In this example a window of x1 = -3:3 is used with a min.window = 4. Unless otherwise specified the first window will begin centered on the first data point and likewise the last window will be centered on the last data point.

Now let’s take a look at a few quick R example that illustrate the function at work as well as how the function reacts to several potential issues in the data.

##### Example 1

We begin by creating our first sample data set to help understand where the features are coming from.

set.seed(10)
fitqDataEx1 <- rnorm(7, 5, 1)
fitqDataEx1
## [1] 5.018746 4.815747 3.628669 4.400832 5.294545 5.389794 3.791924

We now have a small vector of length 7 containing randomly generated numbers. Now we can pass the vector into our fitQ() function and take a look at the output.

fitQ(y = fitqDataEx1, x1 = -3:3, min.window = 4)
## $a ## [1] 5.165912 4.321421 4.232649 4.635257 4.795443 5.028309 3.836657 ## ##$b
## [1] -1.03545313 -0.57956943 -0.05914872 -0.03094635 -0.05480118 -0.65967154
## [7] -2.04183511
##
## $c ## [1] 0.243790387 0.296618854 0.175257194 -0.003805034 -0.085028875 ## [6] -0.395609306 -0.622895830 ## ##$d
## [1] 0.6581464 0.4737011 0.5094792 0.8563527 0.9110471 0.4057781 0.2000512
##
## attr(,"class")
## [1] "fitQ" "list"

It may not be readily apparent, but there are the same number of values for each of the 4 features extracted and the number of values should equal the length of the vector y (when skip=1) because the center of the first window is on the first point and the center moves by an increment of skip which has a default of 1 through the entire vector of y.

Furthermore, the first values of a, b, c, and d are all drawn from the same window as are the second, third, and so on and so forth. We can illustrate this by manually fitting the 2nd window and comparing the results. Realize at the second window fit there will only be 5 points that fall inside the window and are fit to the regression, which can be seen in the illustration above. For this reason we will fit the regression to the first 5 points from our dataset and the corresponding section of our window.

y <- fitqDataEx1[1:5]
x1 <- c(-1:3)
summary(lm(y ~ x1 + I(x1^2)))
##
## Call:
## lm(formula = y ~ x1 + I(x1^2))
##
## Residuals:
##        1        2        3        4        5
## -0.17886  0.49433 -0.40980  0.05207  0.04226
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   4.3214     0.2887  14.969  0.00443 **
## x1           -0.5796     0.2942  -1.970  0.18765
## I(x1^2)       0.2966     0.1266   2.343  0.14387
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4737 on 2 degrees of freedom
## Multiple R-squared:  0.7332, Adjusted R-squared:  0.4665
## F-statistic: 2.749 on 2 and 2 DF,  p-value: 0.2668

As you can see (aside from rounding):

• Coefficient estimate for the intercept is equal to the second value of a
• Coefficient estimate for x1 is equal to the second value of b
• Coefficient estimate of I(x1^2) is equal to the second value of c
• Residual standard error is equal to the second value of d

Hopefully the first example helps to solidify how the feature extraction is being populated. Let’s continue by taking a look at a few issues you might encounter.

##### Example 2

First begin by taking a look at the illustration above and ask yourself “When the first window is centered on the first point, what happens if our min.window = 5 instead?”. Let’s take a look at an example to demonstrate this scenario. We begin by creating another data set.

set.seed(20)
fitqDataEx2 <- rnorm(7, 5, 1)
fitqDataEx2
## [1] 6.162685 4.414076 6.785465 3.667406 4.553433 5.569606 2.110282

We now have a small vector of length 7 containing randomly generated numbers. Now we can pass the vector into our fitQ() function with a min.window=5 and see how the function responds.

fitQ(y = fitqDataEx2, x1 = -3:3, min.window = 5)
## $a ## [1] NA 5.528857 4.897691 5.100493 4.913496 3.831878 NA ## ##$b
## [1]         NA -0.3650638 -0.3026263 -0.4313635 -0.6333119 -0.8237286
## [7]         NA
##
## $c ## [1] NA -0.01572677 0.14075827 -0.08716049 -0.22529714 -0.03945601 ## [7] NA ## ##$d
## [1]       NA 1.614705 1.386009 1.521011 1.630093 1.897100       NA
##
## attr(,"class")
## [1] "fitQ" "list"

As you can see in the output an NA is produced for the first value of each of the features as well as at the end. This is because we are requiring 5 points to fit a regression and since the first and last windows each contain only 4 points (refer to Window 1 and Window 10 of the illustration if you need to convince yourself), no regression will be fit and there will be no features to extract so the function will return NAs.

##### Example 3

So what happens then the function encounters missing values in the data? We begin by creating another data set, this time with missing values.

set.seed(30)
fitqDataEx3 <- rnorm(15, 5, 1)
fitqDataEx3[c(5,7, 9, 10)] <- NA
fitqDataEx3
##  [1] 3.711482 4.652311 4.478371 6.273473       NA 3.488692       NA
##  [8] 4.239204       NA       NA 3.976728 3.180602 4.332210 4.940702
## [15] 5.880166

We now have an example data vector of length 15 with 11 non-missing and 4 missing values. We can now implement the function and see what happens.

fitQ(y = fitqDataEx3, x1 = -3:3, min.window = 4)
## $a ## [1] 3.865672 4.189739 5.337928 5.371472 4.791795 4.395873 NA ## [8] NA 4.391305 3.629989 3.538460 3.696237 4.043787 4.809632 ## [15] 5.836461 ## ##$b
##  [1]  0.11049855  0.53763516  0.32774826 -0.26065985 -0.15740980
##  [6] -0.23641293          NA          NA -0.03461775 -0.13110603
## [11]  0.09938414  0.13840282  0.55669758  0.97499234  0.71161019
##
## $c ## [1] 0.21356830 0.21356830 -0.29420406 -0.29420406 -0.04757267 ## [6] 0.01100045 NA NA -0.11501634 0.09628955 ## [11] 0.11647796 0.20914738 0.20914738 0.20914738 -0.05303605 ## ##$d
##  [1] 0.6895608 0.6895608 0.9035960 0.9035960 1.3103955 1.8098212        NA
##  [8]        NA 0.1557777 0.7259587 0.5212704 0.4599708 0.4599708 0.4599708
## [15] 0.1954521
##
## attr(,"class")
## [1] "fitQ" "list"

We can see that NA values were produced in each of the features that correspond with the 7th and 8th windows fit to the data. Let’s investigate the data in those windows to see why.

Window7 <- fitqDataEx3[4:10]
Window7
## [1] 6.273473       NA 3.488692       NA 4.239204       NA       NA
Window8 <- fitqDataEx3[5:11]
Window8
## [1]       NA 3.488692       NA 4.239204       NA       NA 3.976728

As you can see, in the range of data points for both Window7 and Window8 we only have 3 non-missing data points which resulted in our inability to fit regressions and consequently produced NA values in our feature extraction.

### getFeatures()

#### Description

The getFeatures() function is the main workhorse of the qFeature() package. It is called against a data frame, where fitQ() is applied to the continuous variables and discFeatures() is applied to the discrete (or categorical) variables.

#### How to use getFeatures()

At this point it is important that you understand what is happening in both the fitQ() and the discFeatures() functions, because getFeatures() is simply an aggregate of the two. Rather than dealing with a single vector at a time, this function is capable of dealing with a data frame consisting of multiple continuous and/or discrete variables. The only additional functionality introduced by the getFeatures() function is the ability to output desired summary statistics of the features extracted from the regression models.

Let’s take a look at an example of the getFeatures() function. We begin by creating a data frame that consists of 2 continuous and 2 discrete state variables.

set.seed(10)
cont1 <- rnorm(10,9,1)
cont2 <- runif(10,0,10)
disc1 <- discData <- c("T", "F", "F",
"T", "T", "T",
"F", "T", "F",
"T")
disc2 <- c("blue", "red", "yellow",
"yellow", "blue", "red",
"blue", "red", "yellow",
"blue")

getFeaturesEx <- data.frame(cont1, cont2, disc1, disc2)
getFeaturesEx
##       cont1    cont2 disc1  disc2
## 1  9.018746 8.647212     T   blue
## 2  8.815747 6.153524     F    red
## 3  7.628669 7.751099     F yellow
## 4  8.400832 3.555687     T yellow
## 5  9.294545 4.058500     T   blue
## 6  9.389794 7.066469     T    red
## 7  7.791924 8.382877     F   blue
## 8  8.636324 2.395891     T    red
## 9  7.373327 7.707715     F yellow
## 10 8.743522 3.558977     T   blue

We learned earlier that qFit() could be used to handle a continuous vector and discFeatures() is able to extract information from the discrete state variable, but let’s push the whole data frame into getFeatures().

outGetFeatures <- getFeatures(getFeaturesEx, cont = 1:2, disc = 3:4,
stats = c("mean", "sd"), fitQargs = list(x1 = -3:3))
outGetFeatures
##                cont1.a.mean                  cont1.a.sd
##                 -0.11625763                  0.51336981
##                cont1.b.mean                  cont1.b.sd
##                 -0.09775823                  0.37153955
##                cont1.c.mean                  cont1.c.sd
##                  0.12170029                  0.23103515
##                cont1.d.mean                  cont1.d.sd
##                  1.00407293                  0.21753467
##                cont2.a.mean                  cont2.a.sd
##                 -0.05175897                  0.35596310
##                cont2.b.mean                  cont2.b.sd
##                 -0.19217181                  0.20769032
##                cont2.c.mean                  cont2.c.sd
##                  0.01532974                  0.12476572
##                cont2.d.mean                  cont2.d.sd
##                  1.05163189                  0.28870859
##             disc1.percent.F             disc1.percent.T
##                  0.40000000                  0.60000000
##         disc1.num_trans.F_T         disc1.num_trans.T_F
##                  3.00000000                  3.00000000
##          disc2.percent.blue           disc2.percent.red
##                  0.40000000                  0.30000000
##        disc2.percent.yellow    disc2.num_trans.blue_red
##                  0.30000000                  3.00000000
## disc2.num_trans.blue_yellow    disc2.num_trans.red_blue
##                  0.00000000                  1.00000000
##  disc2.num_trans.red_yellow disc2.num_trans.yellow_blue
##                  2.00000000                  2.00000000
##  disc2.num_trans.yellow_red
##                  0.00000000
##### Continuous Case

It should be apparent that this output is different than what you saw in fitQ() and that is because once you start looking across multiple variables it no longer makes sense to look at a long string of features. Instead we look at feature summary statistics.

[variable].[feature].[stat]

• [variable] Identifies the continuous variable of interest from the data
• cont1
• cont2
• [feature] Identifies the feature
• a
• b
• c
• d
• [stat] identifies the summary statistic being applied to all values of the specified feature.
• mean
• sd

In our example we include mean and standard deviation for all the values extracted for that feature. There are many more summary statistics that could be included that are listed in the getFeatures() help page.

##### Discrete Case

The discrete variables are presented almost identically what we saw in the discFeatures() function. The only real difference here is that now we have a variable identifier to discriminate between discrete variables. Other than that we still have a summary of the percentage of time spent at each state as well as a count of each type of transition present in the data.

[variable].[frequency].[from]_[to]

• [variable] Identifies the discrete variable of interest from the data
• disc1
• disc2
• [frequency] Either the percent of time at a given state or a count of the transitions from one state to another.
• percent
• num_trans
• [from] Identifies the prior discrete state of the variable
• F/T
• red/blue/yellow
• [to] Identifies the posterior discrete state of the variable
• F/T
• red/blue/yellow

### ddply_getFeatures()

#### Description

The ddply_getFeatures() function is simply a wrapper that allows the getFeatures() function to be implemented for each “group” in a data frame, where the groups are defined by the unique combinations
of the values of one or more categorical variables. The replication of getFeatures() for each group is carried out by ddply() from the plyr package.

The importance of this wrapper is that it facilitates processing unique subsets in the data and allows for parallel processing.

#### How to use ddply_getFeatures()

For this example let’s use a dataset built into the qFeature package.

data(demoData)
str(demoData)
## 'data.frame':    628 obs. of  11 variables:
##  $subject: Factor w/ 7 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... ##$ phase  : Factor w/ 3 levels "e","f","g": 1 1 1 1 1 1 1 1 1 1 ...
##  $contA : num 670 790 501 657 629 ... ##$ contB  : num  0 41.6 65.9 83.2 96.6 ...
##  $contC : num 181.2 245.4 51.4 93.7 18.6 ... ##$ contD  : num  5 8 72 80 200 216 343 384 648 900 ...
##  $contE : num 3.74 12.19 13.53 46.97 40.58 ... ##$ discA  : int  4 1 2 4 3 3 4 3 4 4 ...
##  $discB : logi TRUE FALSE FALSE FALSE TRUE TRUE ... ##$ discC  : Factor w/ 3 levels "j","k","l": 3 2 1 3 3 1 3 3 2 3 ...
##  $discD : chr "x" "y" "y" "z" ... Now that we have a data set let’s pretend we are interested in every combination of subject and phase. At this point it is important to note that in order to work properly, the package expects the data it is working with to be structured in several ways. First, it is essential that any data set being processed by this package contain a field which indicating to which grouping variable each record is associated. Second, the package assumes that your data is presented in chronological order (from oldest to newest) throughout each grouping variable. Once you’ve ensured that your data is structured, you can then proceed to process it through the ddply_getFeatures function. The output will be very similar to getFeatures however in this case instead of getting one value for each summary, we will end up with a set of values for each summary equal to the number of unique combinations. A quick calculation shows us that: $$(\mbox{number of subjects}) \cdot (\mbox{number of phases}) = \mbox{number of combinations}$$ meaning that we will have: $$(7) \cdot (3) = 21$$ values for each output variable. We can verify this by pushing our data into the ddply_getFeatures() function. f <- ddply_getFeatures(demoData, c("subject", "phase"), cont = 3:4, disc = 8, stats = c("mean", "sd"), fitQargs = list(x1 = -5:5), nJobs = 2) str(f) ## 'data.frame': 21 obs. of 34 variables: ##$ subject            : Factor w/ 7 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ...
##  $phase : Factor w/ 3 levels "e","f","g": 1 2 3 1 2 3 1 2 3 1 ... ##$ contA.a.mean       : num  0.9594 -0.0963 0.6678 -0.2903 -0.5715 ...
##  $contA.a.sd : num 0.405 0.412 0.457 0.381 0.316 ... ##$ contA.b.mean       : num  -0.05352 0.00514 0.07705 -0.09571 -0.02266 ...
##  $contA.b.sd : num 0.1473 0.1887 0.2799 0.2819 0.0589 ... ##$ contA.c.mean       : num  0.00119 0.01584 -0.00711 0.00979 0.00112 ...
##  $contA.c.sd : num 0.037 0.0359 0.0695 0.0573 0.0237 ... ##$ contA.d.mean       : num  1.077 0.777 0.977 0.73 0.785 ...
##  $contA.d.sd : num 0.18 0.159 0.469 0.208 0.207 ... ##$ contB.a.mean       : num  0.883 -0.235 0.557 -0.138 -0.214 ...
##  $contB.a.sd : num 1.124 0.809 1.171 0.92 0.87 ... ##$ contB.b.mean       : num  0.03379561516091365847 0.037014245176244277291 0.000000000000000000601 -0.002676778537393192763 -0.00589473771636578759 ...
##  $contB.b.sd : num 0.311 0.281 0.365 0.304 0.273 ... ##$ contB.c.mean       : num  -0.0193 -0.0212 -0.0248 -0.022 -0.0187 ...
##  $contB.c.sd : num 0.02 0.0152 0.024 0.016 0.0165 ... ##$ contB.d.mean       : num  0.0443 0.0495 0.0602 0.0476 0.0429 ...
##  $contB.d.sd : num 0.0409 0.0288 0.0483 0.0247 0.0309 ... ##$ discA.percent.1    : num  0.25 0.25 0.27 0.258 0.257 ...
##  $discA.percent.2 : num 0.25 0.25 0.243 0.258 0.257 ... ##$ discA.percent.3    : num  0.25 0.25 0.243 0.258 0.257 ...
##  $discA.percent.4 : num 0.25 0.25 0.243 0.226 0.229 ... ##$ discA.num_trans.1_2: num  5 1 2 4 1 3 3 1 3 1 ...
##  $discA.num_trans.1_3: num 1 3 2 3 2 3 4 1 2 2 ... ##$ discA.num_trans.1_4: num  1 2 2 0 0 3 0 2 2 2 ...
##  $discA.num_trans.2_1: num 3 1 3 2 2 3 2 1 4 1 ... ##$ discA.num_trans.2_3: num  2 3 1 2 3 2 1 2 1 1 ...
##  $discA.num_trans.2_4: num 1 1 3 1 3 2 3 2 1 3 ... ##$ discA.num_trans.3_1: num  2 2 1 1 2 3 3 2 1 2 ...
##  $discA.num_trans.3_2: num 0 2 1 2 2 1 0 1 2 2 ... ##$ discA.num_trans.3_4: num  5 2 4 3 3 3 2 2 1 0 ...
##  $discA.num_trans.4_1: num 3 3 1 3 0 2 1 2 1 2 ... ##$ discA.num_trans.4_2: num  1 2 4 0 4 4 3 2 1 1 ...
##  \$ discA.num_trans.4_3: num  4 0 4 1 2 2 1 2 2 1 ...

If you need a reminder what all the variables being displayed are coming from, you can go ahead and read through the getFeatures() section above. We can clearly see that there are 21 values for each variable that correspond to the 21 combinations of subject and phase. Here is a quick look at the data.

head(f)
##   subject phase contA.a.mean contA.a.sd contA.b.mean contA.b.sd
## 1       1     e   0.95935578  0.4049560 -0.053521124 0.14729155
## 2       1     f  -0.09634974  0.4122154  0.005139956 0.18871465
## 3       1     g   0.66780935  0.4570175  0.077047009 0.27991844
## 4       2     e  -0.29030386  0.3812480 -0.095714751 0.28194231
## 5       2     f  -0.57154431  0.3163018 -0.022664253 0.05891394
## 6       2     g  -0.33432391  0.2662339 -0.037518578 0.10858981
##    contA.c.mean contA.c.sd contA.d.mean contA.d.sd contB.a.mean contB.a.sd
## 1  0.0011891119 0.03701984    1.0773529 0.18018835    0.8828291  1.1243437
## 2  0.0158372006 0.03588238    0.7773663 0.15862451   -0.2349595  0.8088642
## 3 -0.0071132548 0.06948251    0.9774247 0.46886780    0.5571112  1.1712935
## 4  0.0097902953 0.05731774    0.7298378 0.20754398   -0.1379738  0.9195388
## 5  0.0011172144 0.02373771    0.7848196 0.20694008   -0.2138106  0.8699450
## 6 -0.0001564956 0.03137132    0.6432965 0.08199484    0.1598097  0.9701766
##                   contB.b.mean contB.b.sd contB.c.mean contB.c.sd
## 1  0.0337956151609136584701432  0.3107336  -0.01931491 0.02003349
## 2  0.0370142451762442772911221  0.2806856  -0.02115442 0.01515542
## 3  0.0000000000000000006007066  0.3646303  -0.02475077 0.02401877
## 4 -0.0026767785373931927625502  0.3038650  -0.02200054 0.01597915
## 5 -0.0058947377163657875900893  0.2725885  -0.01874485 0.01654354
## 6 -0.0008357828409793670657035  0.2744961  -0.01737797 0.01770307
##   contB.d.mean contB.d.sd discA.percent.1 discA.percent.2 discA.percent.3
## 1   0.04433838 0.04089909       0.2500000       0.2500000       0.2500000
## 2   0.04945053 0.02884678       0.2500000       0.2500000       0.2500000
## 3   0.06022609 0.04830885       0.2702703       0.2432432       0.2432432
## 4   0.04759370 0.02473617       0.2580645       0.2580645       0.2580645
## 5   0.04290807 0.03086693       0.2571429       0.2571429       0.2571429
## 6   0.03897107 0.03278651       0.2682927       0.2439024       0.2439024
##   discA.percent.4 discA.num_trans.1_2 discA.num_trans.1_3
## 1       0.2500000                   5                   1
## 2       0.2500000                   1                   3
## 3       0.2432432                   2                   2
## 4       0.2258065                   4                   3
## 5       0.2285714                   1                   2
## 6       0.2439024                   3                   3
##   discA.num_trans.1_4 discA.num_trans.2_1 discA.num_trans.2_3
## 1                   1                   3                   2
## 2                   2                   1                   3
## 3                   2                   3                   1
## 4                   0                   2                   2
## 5                   0                   2                   3
## 6                   3                   3                   2
##   discA.num_trans.2_4 discA.num_trans.3_1 discA.num_trans.3_2
## 1                   1                   2                   0
## 2                   1                   2                   2
## 3                   3                   1                   1
## 4                   1                   1                   2
## 5                   3                   2                   2
## 6                   2                   3                   1
##   discA.num_trans.3_4 discA.num_trans.4_1 discA.num_trans.4_2
## 1                   5                   3                   1
## 2                   2                   3                   2
## 3                   4                   1                   4
## 4                   3                   3                   0
## 5                   3                   0                   4
## 6                   3                   2                   4
##   discA.num_trans.4_3
## 1                   4
## 2                   0
## 3                   4
## 4                   1
## 5                   2
## 6                   2

Hopefully now you have the knowledge required to implement any of the core functions successfully in your own data set. As you use the package, bear in mind that output from qFeature functions are features, or summary statistics, that contain information about the behavior of the time series. These features will likely need further graphical and statistical analysis, typically using multivariate statistical techniques like dimensionality reduction, clustering, and/or classification.

## References

• Amidan BG, Ferryman TA. 2005. “Atypical Event and Typical Pattern Detection within Complex Systems.” IEEE Aerospace Conference Proceedings, March 2005.

• A mathematical description of the algorithms in fitQ and discFeatures is available here.
Alternatively, after installing the qFeature package, the description is available locally, and this R command will show you where it is located:

file.path(path.package("qFeature"), "doc", "Explanation_of_qFeature_algorithms.pdf")