User guide

This page describes how to use the TSFrames package for timeseries data handling.

Installation

julia> using Pkg
julia> Pkg.add(url="https://github.com/xKDR/TSFrames.jl")

Constructing TSFrame objects

After installing TSFrames you need to load the package in Julia environment. Then, create a basic TSFrame object.

julia> using TSFrames;
julia> ts = TSFrame(1:10)10×1 TSFrame with Int64 Index Index x1 Int64 Int64 ────────────── 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
julia> ts.coredata10×2 DataFrame Row │ Index x1 │ Int64 Int64 ─────┼────────────── 1 │ 1 1 2 │ 2 2 3 │ 3 3 4 │ 4 4 5 │ 5 5 6 │ 6 6 7 │ 7 7 8 │ 8 8 9 │ 9 9 10 │ 10 10

The basic TSFrame constructor takes in a Vector of any type and automatically generates an index out of it (the Index column).

There are many ways to construct a TSFrame object. For real world applications you would want to read in a CSV file or download a dataset as a DataFrame and then operate on it. You can easily convert a DataFrame to a TSFrame object.

julia> using CSV, DataFrames, TSFrames, Dates
julia> dates = Date(2007, 1, 1):Day(1):Date(2008, 03, 06)Date("2007-01-01"):Dates.Day(1):Date("2008-03-06")
julia> ts = TSFrame(DataFrame(Index=dates, value=10*rand(431)))431×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-01 9.73397 2007-01-02 5.84181 2007-01-03 0.65625 2007-01-04 5.86312 2007-01-05 5.51728 2007-01-06 9.5246 2007-01-07 8.78945 2007-01-08 9.94281 ⋮ ⋮ 2008-02-29 3.84641 2008-03-01 6.17761 2008-03-02 0.292995 2008-03-03 4.71666 2008-03-04 5.8359 2008-03-05 6.2736 2008-03-06 4.73226 416 rows omitted

In the above example you generate a random DataFrame and convert it into a TSFrame object ts. The top line of the ts object tells you the number of rows (431 here) and the number of columns (1) along with the Type of Index (Dates.Date in the above example).

You can also fetch the number of rows and columns by using nr(ts), nc(ts), and size(ts) methods. Respectively, they fetch the number of rows, columns, and a Tuple of row and column numbers. A length(::TSFrame) method is also provided for convenience which returns the number of rows of it's argument.

julia> nr(ts)431
julia> nc(ts)1
julia> size(ts)(431, 1)
julia> length(ts)431

Names of data columns can be fetched using the names(ts) method which returns a Vector{String} object. The Index column can be fetched as an object of Vector type by using the index(ts) method, it can also be fetched directly using the underlying coredata property of TSFrame: ts.coredata[!, :Index].

julia> names(ts)1-element Vector{String}:
 "value"
julia> index(ts)431-element Vector{Date}: 2007-01-01 2007-01-02 2007-01-03 2007-01-04 2007-01-05 2007-01-06 2007-01-07 2007-01-08 2007-01-09 2007-01-10 ⋮ 2008-02-27 2008-02-28 2008-02-29 2008-03-01 2008-03-02 2008-03-03 2008-03-04 2008-03-05 2008-03-06

Another simpler way to read a CSV is to pass TSFrame as a sink to the CSV.read function.

julia> ts = CSV.File(filename, TSFrame)

Indexing and subsetting

One of the primary features of a timeseries package is to provide ways to index or subset a dataset using convenient interfaces. TSFrames makes it easier to index a TSFrame object by providing multiple intuitive getindex methods which work by just using the regular square parentheses([ ]).

julia> ts[1] # first row1×1 TSFrame with Date Index
 Index       value
 Date        Float64
─────────────────────
 2007-01-01  9.73397
julia> ts[[3, 5], [1]] # third & fifth row, and first column2×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-03 0.65625 2007-01-05 5.51728
julia> ts[1:10, 1] # first 10 rows and the first column as a vector10-element Vector{Float64}: 9.733968587414477 5.841805386771233 0.6562504335550223 5.86312330329454 5.517283362909282 9.52459601769881 8.789451032375915 9.942811691734047 7.657974698226847 6.626645288281524
julia> ts[1, [:value]] # using the column name1×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-01 9.73397

Apart from integer-based row indexing and integer, name based column indexing, TSFrames provides special subsetting methods for date and time types defined inside the Dates module.

julia> ts[Date(2007, 1, 10)] # on January 10, 20071×1 TSFrame with Date Index
 Index       value
 Date        Float64
─────────────────────
 2007-01-10  6.62665
julia> ts[[Date(2007, 1, 10), Date(2007, 1, 11)]] # January 10, 112×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-10 6.62665 2007-01-11 0.412182
julia> ts[Year(2007), Month(1)] # entire January 200731×1 TSFrame with Date Index Index value Date Float64 ─────────────────────── 2007-01-01 9.73397 2007-01-02 5.84181 2007-01-03 0.65625 2007-01-04 5.86312 2007-01-05 5.51728 2007-01-06 9.5246 2007-01-07 8.78945 2007-01-08 9.94281 ⋮ ⋮ 2007-01-25 0.566132 2007-01-26 3.73588 2007-01-27 2.78977 2007-01-28 2.45337 2007-01-29 7.94426 2007-01-30 8.3398 2007-01-31 0.0861312 16 rows omitted
julia> ts[Year(2007), Quarter(2)]91×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-04-01 0.185652 2007-04-02 6.86856 2007-04-03 6.28615 2007-04-04 3.76365 2007-04-05 6.11929 2007-04-06 0.417649 2007-04-07 1.76431 2007-04-08 4.17653 ⋮ ⋮ 2007-06-24 3.0732 2007-06-25 7.05813 2007-06-26 9.68039 2007-06-27 2.54848 2007-06-28 8.50303 2007-06-29 0.106081 2007-06-30 8.33644 76 rows omitted

Finally, one can also use the dot notation to get a column as a vector.

julia> ts.value # get the value column as a vector431-element Vector{Float64}:
 9.733968587414477
 5.841805386771233
 0.6562504335550223
 5.86312330329454
 5.517283362909282
 9.52459601769881
 8.789451032375915
 9.942811691734047
 7.657974698226847
 6.626645288281524
 ⋮
 5.463912989813932
 2.660594968793716
 3.8464132041770838
 6.17760879866357
 0.2929947354114648
 4.716660788666809
 5.835901249618831
 6.273596284451427
 4.73225806397092

Summary statistics

The describe() method prints summary statistics of the TSFrame object. The output is a DataFrame which includes the number of missing values, data types of columns along with computed statistical values.

julia> TSFrames.describe(ts)2×7 DataFrame
 Row │ variable  mean     min         median      max         nmissing  eltype ⋯
     │ Symbol    Union…   Any         Any         Any         Int64     DataTy ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ Index              2007-01-01  2007-08-04  2008-03-06         0  Date   ⋯
   2 │ value     4.72885  0.0141881   4.65582     9.99528            0  Float6
                                                                1 column omitted

Plotting

A TSFrame object can be plotted using the plot() function of the Plots package. The plotting functionality is provided by RecipesBase package so all the flexibility and functionality of the Plots package is available for users.

using Plots
plot(ts, size=(600,400); legend=false)

Applying a function over a period

The apply method allows you to aggregate the TSFrame object over a period type (Dates.Period(@ref)) and return the output of applying the function on each period. For example, to convert frequency of daily timeseries to monthly you may use first(), last(), or Statistics.mean() functions and the period as Dates.Month.

julia> using Statistics
julia> ts_monthly = apply(ts, Month(1), last) # convert to monthly series using the last value for each month15×1 TSFrame with Date Index Index value_last Date Float64 ──────────────────────── 2007-01-01 0.0861312 2007-02-01 7.66843 2007-03-01 1.52158 2007-04-01 8.76375 2007-05-01 0.351843 2007-06-01 8.33644 2007-07-01 3.33115 2007-08-01 1.15542 2007-09-01 6.62634 2007-10-01 6.27678 2007-11-01 9.8769 2007-12-01 9.27768 2008-01-01 9.05281 2008-02-01 3.84641 2008-03-01 4.73226
julia> ts_weekly = apply(ts, Week(1), Statistics.std) # compute weekly standard deviation62×1 TSFrame with Date Index Index value_std Date Float64 ─────────────────────── 2007-01-01 3.18267 2007-01-08 3.96743 2007-01-15 2.75279 2007-01-22 2.46398 2007-01-29 3.29503 2007-02-05 3.42123 2007-02-12 2.97784 2007-02-19 2.66173 ⋮ ⋮ 2008-01-21 1.26202 2008-01-28 3.37782 2008-02-04 2.07292 2008-02-11 2.63171 2008-02-18 2.77352 2008-02-25 2.9748 2008-03-03 0.788581 47 rows omitted
julia> apply(ts, Week(1), Statistics.std, last) # same as above but index contains the last date of the week62×1 TSFrame with Date Index Index value_std Date Float64 ─────────────────────── 2007-01-07 3.18267 2007-01-14 3.96743 2007-01-21 2.75279 2007-01-28 2.46398 2007-02-04 3.29503 2007-02-11 3.42123 2007-02-18 2.97784 2007-02-25 2.66173 ⋮ ⋮ 2008-01-27 1.26202 2008-02-03 3.37782 2008-02-10 2.07292 2008-02-17 2.63171 2008-02-24 2.77352 2008-03-02 2.9748 2008-03-06 0.788581 47 rows omitted
julia> apply(ts, Week(1), Statistics.std, last, renamecols=false) # do not rename column62×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-07 3.18267 2007-01-14 3.96743 2007-01-21 2.75279 2007-01-28 2.46398 2007-02-04 3.29503 2007-02-11 3.42123 2007-02-18 2.97784 2007-02-25 2.66173 ⋮ ⋮ 2008-01-27 1.26202 2008-02-03 3.37782 2008-02-10 2.07292 2008-02-17 2.63171 2008-02-24 2.77352 2008-03-02 2.9748 2008-03-06 0.788581 47 rows omitted

Joins: Row and column binding with other objects

TSFrames provides methods to join two TSFrame objects by columns: join (alias: cbind) or by rows: vcat (alias: rbind). Both the methods provide some basic intelligence while doing the merge.

join merges two datasets based on the Index values of both objects. Depending on the join strategy employed the final object may only contain index values only from the left object (using jointype=:JoinLeft), the right object (using jointype=:JoinRight), intersection of both objects (using jointype=:JoinBoth), or a union of both objects (jointype=:JoinAll) while inserting missing values where index values are missing from any of the other object.

julia> dates = collect(Date(2007,1,1):Day(1):Date(2007,1,30));
julia> ts2 = TSFrame(rand(length(dates)), dates)30×1 TSFrame with Date Index Index x1 Date Float64 ──────────────────────── 2007-01-01 0.103587 2007-01-02 0.0660796 2007-01-03 0.211131 2007-01-04 0.850458 2007-01-05 0.157577 2007-01-06 0.555865 2007-01-07 0.5901 2007-01-08 0.732587 ⋮ ⋮ 2007-01-24 0.376363 2007-01-25 0.536 2007-01-26 0.544707 2007-01-27 0.20214 2007-01-28 0.612118 2007-01-29 0.11242 2007-01-30 0.535436 15 rows omitted
julia> join(ts, ts2; jointype=:JoinAll) # cbind/join on Index column431×2 TSFrame with Date Index Index value x1 Date Float64? Float64? ──────────────────────────────────────── 2007-01-01 9.73397 0.103587 2007-01-02 5.84181 0.0660796 2007-01-03 0.65625 0.211131 2007-01-04 5.86312 0.850458 2007-01-05 5.51728 0.157577 2007-01-06 9.5246 0.555865 2007-01-07 8.78945 0.5901 2007-01-08 9.94281 0.732587 ⋮ ⋮ ⋮ 2008-02-29 3.84641 missing 2008-03-01 6.17761 missing 2008-03-02 0.292995 missing 2008-03-03 4.71666 missing 2008-03-04 5.8359 missing 2008-03-05 6.2736 missing 2008-03-06 4.73226 missing 416 rows omitted

vcat also works similarly but merges two datasets by rows. This method also uses certain strategies provided via colmerge argument to check for certain conditions before doing the merge, throwing an error if the conditions are not satisfied.

colmerge can be passed setequal which merges only if both objects have same column names, orderequal which merges only if both objects have same column names and columns are in the same order, intersect merges only the columns which are common to both objects, and union which merges even if the columns differ between the two objects, the resulting object has the columns filled with missing, if necessary.

For vcat, if the values of Index are same in the two objects then all the index values along with values in other columns are kept in the resulting object. So, a vcat operation may result in duplicate Index values and the results from other operations may differ or even throw unknown errors.

julia> dates = collect(Date(2008,4,1):Day(1):Date(2008,4,30));
julia> ts3 = TSFrame(DataFrame(values=rand(length(dates)), Index=dates))30×1 TSFrame with Date Index Index values Date Float64 ────────────────────── 2008-04-01 0.630221 2008-04-02 0.921043 2008-04-03 0.653232 2008-04-04 0.457713 2008-04-05 0.607545 2008-04-06 0.254335 2008-04-07 0.126919 2008-04-08 0.70725 ⋮ ⋮ 2008-04-24 0.469532 2008-04-25 0.391875 2008-04-26 0.505043 2008-04-27 0.780166 2008-04-28 0.391329 2008-04-29 0.483267 2008-04-30 0.085829 15 rows omitted
julia> vcat(ts, ts3) # do the merge461×2 TSFrame with Date Index Index value values Date Float64? Float64? ──────────────────────────────────────────── 2007-01-01 9.73397 missing 2007-01-02 5.84181 missing 2007-01-03 0.65625 missing 2007-01-04 5.86312 missing 2007-01-05 5.51728 missing 2007-01-06 9.5246 missing 2007-01-07 8.78945 missing 2007-01-08 9.94281 missing ⋮ ⋮ ⋮ 2008-04-24 missing 0.469532 2008-04-25 missing 0.391875 2008-04-26 missing 0.505043 2008-04-27 missing 0.780166 2008-04-28 missing 0.391329 2008-04-29 missing 0.483267 2008-04-30 missing 0.085829 446 rows omitted

Rolling window operations

The rollapply applies a function over a fixed-size rolling window on the dataset. In the example below, we compute the 10-day average of dataset values on a rolling basis.

julia> rollapply(ts, mean, 10)422×1 TSFrame with Date Index
 Index       rolling_value_mean
 Date        Float64
────────────────────────────────
 2007-01-10             7.01539
 2007-01-11             6.08321
 2007-01-12             5.65751
 2007-01-13             5.63156
 2007-01-14             5.17772
 2007-01-15             5.12919
 2007-01-16             4.61536
 2007-01-17             4.45436
     ⋮               ⋮
 2008-02-29             5.68936
 2008-03-01             5.71806
 2008-03-02             4.79015
 2008-03-03             4.62046
 2008-03-04             4.43225
 2008-03-05             4.57184
 2008-03-06             4.89511
                407 rows omitted

Computing rolling difference and percent change

Similar to apply and rollapply there are specific methods to compute rolling differences and percent changes of a TSFrame object. The diff method computes mathematical difference of values in adjacent rows, inserting missing in the first row. pctchange computes the percentage change between adjacent rows.

julia> diff(ts)431×1 TSFrame with Date Index
 Index       value
 Date        Float64?
────────────────────────────
 2007-01-01  missing
 2007-01-02       -3.89216
 2007-01-03       -5.18555
 2007-01-04        5.20687
 2007-01-05       -0.34584
 2007-01-06        4.00731
 2007-01-07       -0.735145
 2007-01-08        1.15336
     ⋮             ⋮
 2008-02-29        1.18582
 2008-03-01        2.3312
 2008-03-02       -5.88461
 2008-03-03        4.42367
 2008-03-04        1.11924
 2008-03-05        0.437695
 2008-03-06       -1.54134
            416 rows omitted
julia> pctchange(ts)431×1 TSFrame with Date Index Index value Date Float64? ───────────────────────────── 2007-01-01 missing 2007-01-02 -0.399854 2007-01-03 -0.887663 2007-01-04 7.93428 2007-01-05 -0.0589856 2007-01-06 0.72632 2007-01-07 -0.0771838 2007-01-08 0.131221 ⋮ ⋮ 2008-02-29 0.445697 2008-03-01 0.60607 2008-03-02 -0.952571 2008-03-03 15.0981 2008-03-04 0.237295 2008-03-05 0.0750004 2008-03-06 -0.245687 416 rows omitted

Computing log of data values

julia> log.(ts)431×1 TSFrame with Date Index
 Index       value_log
 Date        Float64
───────────────────────
 2007-01-01   2.27562
 2007-01-02   1.76504
 2007-01-03  -0.421213
 2007-01-04   1.76868
 2007-01-05   1.70789
 2007-01-06   2.25388
 2007-01-07   2.17355
 2007-01-08   2.29685
     ⋮           ⋮
 2008-02-29   1.34714
 2008-03-01   1.82093
 2008-03-02  -1.2276
 2008-03-03   1.5511
 2008-03-04   1.76403
 2008-03-05   1.83635
 2008-03-06   1.5544
       416 rows omitted

Creating lagged/leading series

lag() and lead() provide ways to lag or lead a series respectively by a fixed value, inserting missing where required.

julia> lag(ts, 2)431×1 TSFrame with Date Index
 Index       value
 Date        Float64?
────────────────────────────
 2007-01-01  missing
 2007-01-02  missing
 2007-01-03        9.73397
 2007-01-04        5.84181
 2007-01-05        0.65625
 2007-01-06        5.86312
 2007-01-07        5.51728
 2007-01-08        9.5246
     ⋮             ⋮
 2008-02-29        5.46391
 2008-03-01        2.66059
 2008-03-02        3.84641
 2008-03-03        6.17761
 2008-03-04        0.292995
 2008-03-05        4.71666
 2008-03-06        5.8359
            416 rows omitted
julia> lead(ts, 2)431×1 TSFrame with Date Index Index value Date Float64? ──────────────────────────── 2007-01-01 0.65625 2007-01-02 5.86312 2007-01-03 5.51728 2007-01-04 9.5246 2007-01-05 8.78945 2007-01-06 9.94281 2007-01-07 7.65797 2007-01-08 6.62665 ⋮ ⋮ 2008-02-29 0.292995 2008-03-01 4.71666 2008-03-02 5.8359 2008-03-03 6.2736 2008-03-04 4.73226 2008-03-05 missing 2008-03-06 missing 416 rows omitted

Converting to Matrix and DataFrame

You can easily convert a TSFrame object into a Matrix or fetch the DataFrame for doing operations which are outside of the TSFrames scope.

julia> ts[:, 1] # convert column 1 to a vector of floats431-element Vector{Float64}:
 9.733968587414477
 5.841805386771233
 0.6562504335550223
 5.86312330329454
 5.517283362909282
 9.52459601769881
 8.789451032375915
 9.942811691734047
 7.657974698226847
 6.626645288281524
 ⋮
 5.463912989813932
 2.660594968793716
 3.8464132041770838
 6.17760879866357
 0.2929947354114648
 4.716660788666809
 5.835901249618831
 6.273596284451427
 4.73225806397092
julia> Matrix(ts) # convert entire TSFrame into a Matrix431×1 Matrix{Float64}: 9.733968587414477 5.841805386771233 0.6562504335550223 5.86312330329454 5.517283362909282 9.52459601769881 8.789451032375915 9.942811691734047 7.657974698226847 6.626645288281524 ⋮ 5.463912989813932 2.660594968793716 3.8464132041770838 6.17760879866357 0.2929947354114648 4.716660788666809 5.835901249618831 6.273596284451427 4.73225806397092
julia> select(ts.coredata, :Index, :value, DataFrames.nrow) # use the underlying DataFrame for other operations431×3 DataFrame Row │ Index value nrow │ Date Float64 Int64 ─────┼───────────────────────────── 1 │ 2007-01-01 9.73397 431 2 │ 2007-01-02 5.84181 431 3 │ 2007-01-03 0.65625 431 4 │ 2007-01-04 5.86312 431 5 │ 2007-01-05 5.51728 431 6 │ 2007-01-06 9.5246 431 7 │ 2007-01-07 8.78945 431 8 │ 2007-01-08 9.94281 431 ⋮ │ ⋮ ⋮ ⋮ 425 │ 2008-02-29 3.84641 431 426 │ 2008-03-01 6.17761 431 427 │ 2008-03-02 0.292995 431 428 │ 2008-03-03 4.71666 431 429 │ 2008-03-04 5.8359 431 430 │ 2008-03-05 6.2736 431 431 │ 2008-03-06 4.73226 431 416 rows omitted

Writing TSFrame into a CSV file

Writing a TSFrame object into a CSV file can be done easily by using the underlying coredata property. This DataFrame can be passed to the CSV.write method for writing into a file.

julia> CSV.write("/tmp/demo_ts.csv", ts)"/tmp/demo_ts.csv"

Broadcasting

Broadcasting can be used on a TSFrame object to apply a function to a subset of it's columns.

julia> using TSFrames, DataFrames;

julia> ts = TSFrame(DataFrame(Index = [1, 2, 3, 4, 5], A = [10.1, 12.4, 42.4, 24.1, 242.5], B = [2, 4, 6, 8, 10]))
(5 x 2) TSFrame with Int64 Index

 Index  A        B     
 Int64  Float64  Int64 
───────────────────────
     1     10.1      2
     2     12.4      4
     3     42.4      6
     4     24.1      8
     5    242.5     10

julia> sin_A = sin.(ts[:, [:A]])    # get sin of column A
(5 x 1) TSFrame with Int64 Index

 Index  A_sin
 Int64  Float64
──────────────────
     1  -0.625071
     2  -0.165604
     3  -0.999934
     4  -0.858707
     5  -0.562466

julia> log_ts = log.(ts)    # take log of all columns
(5 x 2) TSFrame with Int64 Index

 Index  A_log    B_log
 Int64  Float64  Float64
──────────────────────────
     1  2.31254  0.693147
     2  2.5177   1.38629
     3  3.74715  1.79176
     4  3.18221  2.07944
     5  5.491    2.30259

julia> log_ts = log.(ts[:, [:A, :B]])   # can specify multiple columns
(5 x 2) TSFrame with Int64 Index

 Index  A_log    B_log
 Int64  Float64  Float64
──────────────────────────
     1  2.31254  0.693147
     2  2.5177   1.38629
     3  3.74715  1.79176
     4  3.18221  2.07944
     5  5.491    2.30259

Tables.jl Integration

TSFrame objects are Tables.jl compatible. This integration enables easy conversion between the TSFrame format and other formats which are Tables.jl compatible.

As an example, first consider the following code which converts a TSFrame object into a DataFrame, a TimeArray and a CSV file respectively.

julia> using TSFrames, TimeSeries, Dates, DataFrames, CSV;

julia> dates = Date(2018, 1, 1):Day(1):Date(2018, 12, 31)
Date("2018-01-01"):Day(1):Date("2018-12-31")

julia> ts = TSFrame(DataFrame(Index = dates, x1 = 1:365));

# conversion to DataFrames
julia> df = DataFrame(ts);

# conversion to TimeArray
julia> timeArray = TimeArray(ts, timestamp = :Index);

# writing to CSV
julia> CSV.write("ts.csv", ts);

Next, here is some code which converts a DataFrame, a TimeArray and a CSV file to a TSFrame object.

julia> using TSFrames, DataFrames, CSV, TimeSeries, Dates;

# converting DataFrame to TSFrame
julia> ts = TSFrame(DataFrame(Index=1:10, x1=1:10));

# converting from TimeArray to TSFrame
julia> dates = Date(2018, 1, 1):Day(1):Date(2018, 12, 31)
Date("2018-01-01"):Day(1):Date("2018-12-31")

julia> ta = TimeArray(dates, rand(length(dates)));

julia> ts = TSFrame(ta);

# converting from CSV to TSFrame
julia> CSV.read("ts.csv", TSFrame);
Note

This discussion warrants a note about how we've implemented the Tables.jl interfaces. Since TSFrame objects are nothing but a wrapper around a DataFrame, our implementations of these interfaces just call DataFrames.jl's implementations. Moreover, while constructing TSFrame objects out of other Tables.jl compatible types, our constructor first converts the input table to a DataFrame, and then converts the DataFrame to a TSFrame object.