This guide is meant to provide a quick start for using `qFeature`

package. We begin with installation instructions in ‘Getting Started’. Following installation is a basic diagram of the ‘Package Structure’ to illustrate the hierarchy of functions. Finally, we conclude with details regarding the four ‘Core Functions’ contained within the package complete with follow-along examples.

The `qFeature`

package is designed to extract features from continuous and discrete variables contained within a time series dataset. These features can then be used as inputs to multivariate statistical procedures like clustering, dimensionality reduction, and classification. This is a high-speed implementation of the feature extraction methods of the Morning Report Algorithms developed by Brett Amidan and Tom Ferryman (Amidan and Ferryman 2005).

The first thing you will need to do is install the `qFeature`

package. Installation instructions are provided in the README.md file of the package repository. Once the package has been installed, the library can be loaded so that the functions can be used.

`library(qFeature)`

A complete list of the functions contained within the `qFeature`

package can be seen using:

`help(package = qFeature)`

In what follows, we explain the four core functions of `qFeature`

: `ddply_getFeatures()`

, `getFeatures()`

, `discFeatures()`

, and `fitQ`

.

This function is intended for use on a time series variable with discrete states and calculates the percentage of time spent at each of those states as well as counting the transitions in the variable from one state to another.

Using `discFeatures()`

is most easily demonstrated with a simple example. Let’s begin by creating a small dataset with 2 discrete states (TRUE/FALSE).

```
discData <- c("TRUE", "FALSE", "FALSE", NA, "TRUE", "TRUE", NA, NA, "TRUE", "FALSE", "TRUE", "FALSE", "TRUE")
discData
```

```
## [1] "TRUE" "FALSE" "FALSE" NA "TRUE" "TRUE" NA NA
## [9] "TRUE" "FALSE" "TRUE" "FALSE" "TRUE"
```

Now we have a small data set of length 13 that contains two discrete phases (also called “grouping variables”) and 3 missing values stored as NA. Now if we apply the `discFeatures()`

function to our dataset we can see what happens.

`discFeatures(discData)`

```
## percent.FALSE percent.TRUE num_trans.FALSE_TRUE
## 0.4 0.6 3.0
## num_trans.TRUE_FALSE
## 3.0
```

As you can see the percentage calculations are made without consideration of the missing values and reflect that 40% of the data are FALSE and 60% of the data are TRUE. Additionally we have some information about transitions or in other words we can see that the value changed from FALSE to TRUE 3 times and from TRUE to FALSE 3 times. Go ahead and count the transitions yourself just to make sure you agree with the output. The transitions have significance because this function is intended to be used on time series data and so a transition from one state to another could be meaningful.

Fits a moving quadratic (or simple linear) regression model over a series using a 2-sided window. It dynamically accounts for the incomplete windows which are caused by missing values and which occur at the beginning and end of the series. This function is used to extract data from continuous variables.

For your vector of response measures, a window is fit to the data that is initially centered at whatever point is indexed by `start`

. From here the `x1`

is used to define the width of the window which is extended equally in both directions about a center point (which is why the length must be odd). `min.window`

defines how many points are required in order to fit a model and subsequently produce a value in our signature. Once our window characteristics are defined we can choose one of two regression models:

**Linear:** \(y=b_0+b_1 x_1+\epsilon\)

or

**Quadratic:** \(y=b_0+b_1 x_1+b_2 x_2^2+\epsilon\)

After our initial linear or quadratic model is fit, several features are extracted from the regression and denoted as **a**, **b**, **c**, and **d** which are defined below.

**a**: The estimated intercepts

**b**: The estimated linear coefficients

**c**: The estimated quadratic coefficients. These are NA if linear.only = TRUE

**d**: The root mean squared error (RMSE)

The data moves through the window by an increment of `skip`

which has a default of 1 and fits the regression model using the data contained in each new window. This iterates over the entire vector of the response measure in order to produce a signature of features with a corresponding **a**, **b**, **c**, and **d** for each regression that was fit.

The illustration below provides a simple demonstration of how `fitQ()`

fits a series of regression models to finite windows of data. The gray points denote data that falls outside the current window that is being used in fitting the regression model. In this example a window of `x1`

= -3:3 is used with a `min.window`

= 4. Unless otherwise specified the first window will begin centered on the first data point and likewise the last window will be centered on the last data point.

Now let’s take a look at a few quick R example that illustrate the function at work as well as how the function reacts to several potential issues in the data.

We begin by creating our first sample data set to help understand where the features are coming from.

```
set.seed(10)
fitqDataEx1 <- rnorm(7, 5, 1)
fitqDataEx1
```

`## [1] 5.018746 4.815747 3.628669 4.400832 5.294545 5.389794 3.791924`

We now have a small vector of length 7 containing randomly generated numbers. Now we can pass the vector into our `fitQ()`

function and take a look at the output.

`fitQ(y = fitqDataEx1, x1 = -3:3, min.window = 4)`

```
## $a
## [1] 5.165912 4.321421 4.232649 4.635257 4.795443 5.028309 3.836657
##
## $b
## [1] -1.03545313 -0.57956943 -0.05914872 -0.03094635 -0.05480118 -0.65967154
## [7] -2.04183511
##
## $c
## [1] 0.243790387 0.296618854 0.175257194 -0.003805034 -0.085028875
## [6] -0.395609306 -0.622895830
##
## $d
## [1] 0.6581464 0.4737011 0.5094792 0.8563527 0.9110471 0.4057781 0.2000512
##
## attr(,"class")
## [1] "fitQ" "list"
```

It may not be readily apparent, but there are the same number of values for each of the 4 features extracted and the number of values should equal the length of the vector `y`

(when `skip`

=1) because the center of the first window is on the first point and the center moves by an increment of `skip`

which has a default of 1 through the entire vector of `y`

.

Furthermore, the first values of **a**, **b**, **c**, and **d** are all drawn from the same window as are the second, third, and so on and so forth. We can illustrate this by manually fitting the 2nd window and comparing the results. Realize at the second window fit there will only be 5 points that fall inside the window and are fit to the regression, which can be seen in the illustration above. For this reason we will fit the regression to the first 5 points from our dataset and the corresponding section of our window.

```
y <- fitqDataEx1[1:5]
x1 <- c(-1:3)
summary(lm(y ~ x1 + I(x1^2)))
```

```
##
## Call:
## lm(formula = y ~ x1 + I(x1^2))
##
## Residuals:
## 1 2 3 4 5
## -0.17886 0.49433 -0.40980 0.05207 0.04226
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.3214 0.2887 14.969 0.00443 **
## x1 -0.5796 0.2942 -1.970 0.18765
## I(x1^2) 0.2966 0.1266 2.343 0.14387
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4737 on 2 degrees of freedom
## Multiple R-squared: 0.7332, Adjusted R-squared: 0.4665
## F-statistic: 2.749 on 2 and 2 DF, p-value: 0.2668
```

As you can see (aside from rounding):

- Coefficient estimate for the intercept is equal to the second value of
**a** - Coefficient estimate for x1 is equal to the second value of
**b** - Coefficient estimate of I(x1^2) is equal to the second value of
**c** - Residual standard error is equal to the second value of
**d**

Hopefully the first example helps to solidify how the feature extraction is being populated. Let’s continue by taking a look at a few issues you might encounter.

First begin by taking a look at the illustration above and ask yourself “When the first window is centered on the first point, what happens if our `min.window`

= 5 instead?”. Let’s take a look at an example to demonstrate this scenario. We begin by creating another data set.

```
set.seed(20)
fitqDataEx2 <- rnorm(7, 5, 1)
fitqDataEx2
```

`## [1] 6.162685 4.414076 6.785465 3.667406 4.553433 5.569606 2.110282`

We now have a small vector of length 7 containing randomly generated numbers. Now we can pass the vector into our `fitQ()`

function with a `min.window`

=5 and see how the function responds.

`fitQ(y = fitqDataEx2, x1 = -3:3, min.window = 5)`

```
## $a
## [1] NA 5.528857 4.897691 5.100493 4.913496 3.831878 NA
##
## $b
## [1] NA -0.3650638 -0.3026263 -0.4313635 -0.6333119 -0.8237286
## [7] NA
##
## $c
## [1] NA -0.01572677 0.14075827 -0.08716049 -0.22529714 -0.03945601
## [7] NA
##
## $d
## [1] NA 1.614705 1.386009 1.521011 1.630093 1.897100 NA
##
## attr(,"class")
## [1] "fitQ" "list"
```

As you can see in the output an NA is produced for the first value of each of the features as well as at the end. This is because we are requiring 5 points to fit a regression and since the first and last windows each contain only 4 points (refer to Window 1 and Window 10 of the illustration if you need to convince yourself), no regression will be fit and there will be no features to extract so the function will return NAs.

So what happens then the function encounters missing values in the data? We begin by creating another data set, this time with missing values.

```
set.seed(30)
fitqDataEx3 <- rnorm(15, 5, 1)
fitqDataEx3[c(5,7, 9, 10)] <- NA
fitqDataEx3
```

```
## [1] 3.711482 4.652311 4.478371 6.273473 NA 3.488692 NA
## [8] 4.239204 NA NA 3.976728 3.180602 4.332210 4.940702
## [15] 5.880166
```

We now have an example data vector of length 15 with 11 non-missing and 4 missing values. We can now implement the function and see what happens.

`fitQ(y = fitqDataEx3, x1 = -3:3, min.window = 4)`

```
## $a
## [1] 3.865672 4.189739 5.337928 5.371472 4.791795 4.395873 NA
## [8] NA 4.391305 3.629989 3.538460 3.696237 4.043787 4.809632
## [15] 5.836461
##
## $b
## [1] 0.11049855 0.53763516 0.32774826 -0.26065985 -0.15740980
## [6] -0.23641293 NA NA -0.03461775 -0.13110603
## [11] 0.09938414 0.13840282 0.55669758 0.97499234 0.71161019
##
## $c
## [1] 0.21356830 0.21356830 -0.29420406 -0.29420406 -0.04757267
## [6] 0.01100045 NA NA -0.11501634 0.09628955
## [11] 0.11647796 0.20914738 0.20914738 0.20914738 -0.05303605
##
## $d
## [1] 0.6895608 0.6895608 0.9035960 0.9035960 1.3103955 1.8098212 NA
## [8] NA 0.1557777 0.7259587 0.5212704 0.4599708 0.4599708 0.4599708
## [15] 0.1954521
##
## attr(,"class")
## [1] "fitQ" "list"
```

We can see that NA values were produced in each of the features that correspond with the 7th and 8th windows fit to the data. Let’s investigate the data in those windows to see why.

```
Window7 <- fitqDataEx3[4:10]
Window7
```

`## [1] 6.273473 NA 3.488692 NA 4.239204 NA NA`

```
Window8 <- fitqDataEx3[5:11]
Window8
```

`## [1] NA 3.488692 NA 4.239204 NA NA 3.976728`

As you can see, in the range of data points for both `Window7`

and `Window8`

we only have 3 non-missing data points which resulted in our inability to fit regressions and consequently produced NA values in our feature extraction.

The `getFeatures()`

function is the main workhorse of the `qFeature()`

package. It is called against a data frame, where `fitQ()`

is applied to the continuous variables and `discFeatures()`

is applied to the discrete (or categorical) variables.

At this point it is important that you understand what is happening in both the `fitQ()`

and the `discFeatures()`

functions, because `getFeatures()`

is simply an aggregate of the two. Rather than dealing with a single vector at a time, this function is capable of dealing with a data frame consisting of multiple continuous and/or discrete variables. The only additional functionality introduced by the `getFeatures()`

function is the ability to output desired summary statistics of the features extracted from the regression models.

Let’s take a look at an example of the `getFeatures()`

function. We begin by creating a data frame that consists of 2 continuous and 2 discrete state variables.

```
set.seed(10)
cont1 <- rnorm(10,9,1)
cont2 <- runif(10,0,10)
disc1 <- discData <- c("T", "F", "F",
"T", "T", "T",
"F", "T", "F",
"T")
disc2 <- c("blue", "red", "yellow",
"yellow", "blue", "red",
"blue", "red", "yellow",
"blue")
getFeaturesEx <- data.frame(cont1, cont2, disc1, disc2)
getFeaturesEx
```

```
## cont1 cont2 disc1 disc2
## 1 9.018746 8.647212 T blue
## 2 8.815747 6.153524 F red
## 3 7.628669 7.751099 F yellow
## 4 8.400832 3.555687 T yellow
## 5 9.294545 4.058500 T blue
## 6 9.389794 7.066469 T red
## 7 7.791924 8.382877 F blue
## 8 8.636324 2.395891 T red
## 9 7.373327 7.707715 F yellow
## 10 8.743522 3.558977 T blue
```

We learned earlier that `qFit()`

could be used to handle a continuous vector and `discFeatures()`

is able to extract information from the discrete state variable, but let’s push the whole data frame into `getFeatures()`

.

```
outGetFeatures <- getFeatures(getFeaturesEx, cont = 1:2, disc = 3:4,
stats = c("mean", "sd"), fitQargs = list(x1 = -3:3))
outGetFeatures
```

```
## cont1.a.mean cont1.a.sd
## -0.11625763 0.51336981
## cont1.b.mean cont1.b.sd
## -0.09775823 0.37153955
## cont1.c.mean cont1.c.sd
## 0.12170029 0.23103515
## cont1.d.mean cont1.d.sd
## 1.00407293 0.21753467
## cont2.a.mean cont2.a.sd
## -0.05175897 0.35596310
## cont2.b.mean cont2.b.sd
## -0.19217181 0.20769032
## cont2.c.mean cont2.c.sd
## 0.01532974 0.12476572
## cont2.d.mean cont2.d.sd
## 1.05163189 0.28870859
## disc1.percent.F disc1.percent.T
## 0.40000000 0.60000000
## disc1.num_trans.F_T disc1.num_trans.T_F
## 3.00000000 3.00000000
## disc2.percent.blue disc2.percent.red
## 0.40000000 0.30000000
## disc2.percent.yellow disc2.num_trans.blue_red
## 0.30000000 3.00000000
## disc2.num_trans.blue_yellow disc2.num_trans.red_blue
## 0.00000000 1.00000000
## disc2.num_trans.red_yellow disc2.num_trans.yellow_blue
## 2.00000000 2.00000000
## disc2.num_trans.yellow_red
## 0.00000000
```

It should be apparent that this output is different than what you saw in `fitQ()`

and that is because once you start looking across multiple variables it no longer makes sense to look at a long string of features. Instead we look at feature summary statistics.

**[variable].[feature].[stat]**

**[variable]**Identifies the continuous variable of interest from the data- cont1
- cont2

**[feature]**Identifies the feature- a
- b
- c
- d

**[stat]**identifies the summary statistic being applied to all values of the specified feature.- mean
- sd

In our example we include mean and standard deviation for all the values extracted for that feature. There are many more summary statistics that could be included that are listed in the `getFeatures()`

help page.

The discrete variables are presented almost identically what we saw in the `discFeatures()`

function. The only real difference here is that now we have a variable identifier to discriminate between discrete variables. Other than that we still have a summary of the percentage of time spent at each state as well as a count of each type of transition present in the data.

**[variable].[frequency].[from]_[to]**

**[variable]**Identifies the discrete variable of interest from the data- disc1
- disc2

**[frequency]**Either the percent of time at a given state or a count of the transitions from one state to another.- percent
- num_trans

**[from]**Identifies the prior discrete state of the variable- F/T
- red/blue/yellow

**[to]**Identifies the posterior discrete state of the variable- F/T
- red/blue/yellow

The `ddply_getFeatures()`

function is simply a wrapper that allows the `getFeatures()`

function to be implemented for each “group” in a data frame, where the groups are defined by the unique combinations

of the values of one or more categorical variables. The replication of `getFeatures()`

for each group is carried out by `ddply()`

from the `plyr`

package.

The importance of this wrapper is that it facilitates processing unique subsets in the data and allows for parallel processing.

For this example let’s use a dataset built into the `qFeature`

package.

```
data(demoData)
str(demoData)
```

```
## 'data.frame': 628 obs. of 11 variables:
## $ subject: Factor w/ 7 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ phase : Factor w/ 3 levels "e","f","g": 1 1 1 1 1 1 1 1 1 1 ...
## $ contA : num 670 790 501 657 629 ...
## $ contB : num 0 41.6 65.9 83.2 96.6 ...
## $ contC : num 181.2 245.4 51.4 93.7 18.6 ...
## $ contD : num 5 8 72 80 200 216 343 384 648 900 ...
## $ contE : num 3.74 12.19 13.53 46.97 40.58 ...
## $ discA : int 4 1 2 4 3 3 4 3 4 4 ...
## $ discB : logi TRUE FALSE FALSE FALSE TRUE TRUE ...
## $ discC : Factor w/ 3 levels "j","k","l": 3 2 1 3 3 1 3 3 2 3 ...
## $ discD : chr "x" "y" "y" "z" ...
```

Now that we have a data set let’s pretend we are interested in every combination of subject and phase. At this point it is important to note that in order to work properly, the package expects the data it is working with to be structured in several ways. First, it is essential that any data set being processed by this package contain a field which indicating to which grouping variable each record is associated. Second, the package assumes that your data is presented in chronological order (from oldest to newest) throughout each grouping variable. Once you’ve ensured that your data is structured, you can then proceed to process it through the `ddply_getFeatures`

function. The output will be very similar to `getFeatures`

however in this case instead of getting one value for each summary, we will end up with a set of values for each summary equal to the number of unique combinations. A quick calculation shows us that:

\((\mbox{number of subjects}) \cdot (\mbox{number of phases}) = \mbox{number of combinations}\)

meaning that we will have:

\((7) \cdot (3) = 21\) values for each output variable.

We can verify this by pushing our data into the `ddply_getFeatures()`

function.

```
f <- ddply_getFeatures(demoData, c("subject", "phase"),
cont = 3:4, disc = 8, stats = c("mean", "sd"),
fitQargs = list(x1 = -5:5), nJobs = 2)
str(f)
```

```
## 'data.frame': 21 obs. of 34 variables:
## $ subject : Factor w/ 7 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ...
## $ phase : Factor w/ 3 levels "e","f","g": 1 2 3 1 2 3 1 2 3 1 ...
## $ contA.a.mean : num 0.9594 -0.0963 0.6678 -0.2903 -0.5715 ...
## $ contA.a.sd : num 0.405 0.412 0.457 0.381 0.316 ...
## $ contA.b.mean : num -0.05352 0.00514 0.07705 -0.09571 -0.02266 ...
## $ contA.b.sd : num 0.1473 0.1887 0.2799 0.2819 0.0589 ...
## $ contA.c.mean : num 0.00119 0.01584 -0.00711 0.00979 0.00112 ...
## $ contA.c.sd : num 0.037 0.0359 0.0695 0.0573 0.0237 ...
## $ contA.d.mean : num 1.077 0.777 0.977 0.73 0.785 ...
## $ contA.d.sd : num 0.18 0.159 0.469 0.208 0.207 ...
## $ contB.a.mean : num 0.883 -0.235 0.557 -0.138 -0.214 ...
## $ contB.a.sd : num 1.124 0.809 1.171 0.92 0.87 ...
## $ contB.b.mean : num 0.03379561516091365847 0.037014245176244277291 0.000000000000000000601 -0.002676778537393192763 -0.00589473771636578759 ...
## $ contB.b.sd : num 0.311 0.281 0.365 0.304 0.273 ...
## $ contB.c.mean : num -0.0193 -0.0212 -0.0248 -0.022 -0.0187 ...
## $ contB.c.sd : num 0.02 0.0152 0.024 0.016 0.0165 ...
## $ contB.d.mean : num 0.0443 0.0495 0.0602 0.0476 0.0429 ...
## $ contB.d.sd : num 0.0409 0.0288 0.0483 0.0247 0.0309 ...
## $ discA.percent.1 : num 0.25 0.25 0.27 0.258 0.257 ...
## $ discA.percent.2 : num 0.25 0.25 0.243 0.258 0.257 ...
## $ discA.percent.3 : num 0.25 0.25 0.243 0.258 0.257 ...
## $ discA.percent.4 : num 0.25 0.25 0.243 0.226 0.229 ...
## $ discA.num_trans.1_2: num 5 1 2 4 1 3 3 1 3 1 ...
## $ discA.num_trans.1_3: num 1 3 2 3 2 3 4 1 2 2 ...
## $ discA.num_trans.1_4: num 1 2 2 0 0 3 0 2 2 2 ...
## $ discA.num_trans.2_1: num 3 1 3 2 2 3 2 1 4 1 ...
## $ discA.num_trans.2_3: num 2 3 1 2 3 2 1 2 1 1 ...
## $ discA.num_trans.2_4: num 1 1 3 1 3 2 3 2 1 3 ...
## $ discA.num_trans.3_1: num 2 2 1 1 2 3 3 2 1 2 ...
## $ discA.num_trans.3_2: num 0 2 1 2 2 1 0 1 2 2 ...
## $ discA.num_trans.3_4: num 5 2 4 3 3 3 2 2 1 0 ...
## $ discA.num_trans.4_1: num 3 3 1 3 0 2 1 2 1 2 ...
## $ discA.num_trans.4_2: num 1 2 4 0 4 4 3 2 1 1 ...
## $ discA.num_trans.4_3: num 4 0 4 1 2 2 1 2 2 1 ...
```

If you need a reminder what all the variables being displayed are coming from, you can go ahead and read through the `getFeatures()`

section above. We can clearly see that there are 21 values for each variable that correspond to the 21 combinations of subject and phase. Here is a quick look at the data.

`head(f)`

```
## subject phase contA.a.mean contA.a.sd contA.b.mean contA.b.sd
## 1 1 e 0.95935578 0.4049560 -0.053521124 0.14729155
## 2 1 f -0.09634974 0.4122154 0.005139956 0.18871465
## 3 1 g 0.66780935 0.4570175 0.077047009 0.27991844
## 4 2 e -0.29030386 0.3812480 -0.095714751 0.28194231
## 5 2 f -0.57154431 0.3163018 -0.022664253 0.05891394
## 6 2 g -0.33432391 0.2662339 -0.037518578 0.10858981
## contA.c.mean contA.c.sd contA.d.mean contA.d.sd contB.a.mean contB.a.sd
## 1 0.0011891119 0.03701984 1.0773529 0.18018835 0.8828291 1.1243437
## 2 0.0158372006 0.03588238 0.7773663 0.15862451 -0.2349595 0.8088642
## 3 -0.0071132548 0.06948251 0.9774247 0.46886780 0.5571112 1.1712935
## 4 0.0097902953 0.05731774 0.7298378 0.20754398 -0.1379738 0.9195388
## 5 0.0011172144 0.02373771 0.7848196 0.20694008 -0.2138106 0.8699450
## 6 -0.0001564956 0.03137132 0.6432965 0.08199484 0.1598097 0.9701766
## contB.b.mean contB.b.sd contB.c.mean contB.c.sd
## 1 0.0337956151609136584701432 0.3107336 -0.01931491 0.02003349
## 2 0.0370142451762442772911221 0.2806856 -0.02115442 0.01515542
## 3 0.0000000000000000006007066 0.3646303 -0.02475077 0.02401877
## 4 -0.0026767785373931927625502 0.3038650 -0.02200054 0.01597915
## 5 -0.0058947377163657875900893 0.2725885 -0.01874485 0.01654354
## 6 -0.0008357828409793670657035 0.2744961 -0.01737797 0.01770307
## contB.d.mean contB.d.sd discA.percent.1 discA.percent.2 discA.percent.3
## 1 0.04433838 0.04089909 0.2500000 0.2500000 0.2500000
## 2 0.04945053 0.02884678 0.2500000 0.2500000 0.2500000
## 3 0.06022609 0.04830885 0.2702703 0.2432432 0.2432432
## 4 0.04759370 0.02473617 0.2580645 0.2580645 0.2580645
## 5 0.04290807 0.03086693 0.2571429 0.2571429 0.2571429
## 6 0.03897107 0.03278651 0.2682927 0.2439024 0.2439024
## discA.percent.4 discA.num_trans.1_2 discA.num_trans.1_3
## 1 0.2500000 5 1
## 2 0.2500000 1 3
## 3 0.2432432 2 2
## 4 0.2258065 4 3
## 5 0.2285714 1 2
## 6 0.2439024 3 3
## discA.num_trans.1_4 discA.num_trans.2_1 discA.num_trans.2_3
## 1 1 3 2
## 2 2 1 3
## 3 2 3 1
## 4 0 2 2
## 5 0 2 3
## 6 3 3 2
## discA.num_trans.2_4 discA.num_trans.3_1 discA.num_trans.3_2
## 1 1 2 0
## 2 1 2 2
## 3 3 1 1
## 4 1 1 2
## 5 3 2 2
## 6 2 3 1
## discA.num_trans.3_4 discA.num_trans.4_1 discA.num_trans.4_2
## 1 5 3 1
## 2 2 3 2
## 3 4 1 4
## 4 3 3 0
## 5 3 0 4
## 6 3 2 4
## discA.num_trans.4_3
## 1 4
## 2 0
## 3 4
## 4 1
## 5 2
## 6 2
```

Hopefully now you have the knowledge required to implement any of the core functions successfully in your own data set. As you use the package, bear in mind that output from `qFeature`

functions are features, or summary statistics, that contain information about the behavior of the time series. These features will likely need further graphical and statistical analysis, typically using multivariate statistical techniques like dimensionality reduction, clustering, and/or classification.

Amidan BG, Ferryman TA. 2005. “Atypical Event and Typical Pattern Detection within Complex Systems.” IEEE Aerospace Conference Proceedings, March 2005.

A mathematical description of the algorithms in

`fitQ`

and`discFeatures`

is available here.

Alternatively, after installing the`qFeature`

package, the description is available locally, and this R command will show you where it is located:

`file.path(path.package("qFeature"), "doc", "Explanation_of_qFeature_algorithms.pdf")`