User guide

This page describes how to use the TSFrames package for timeseries data handling.

Installation

julia> using Pkg
julia> Pkg.add(url="https://github.com/xKDR/TSFrames.jl")

Constructing TSFrame objects

After installing TSFrames you need to load the package in Julia environment. Then, create a basic TSFrame object.

julia> using TSFrames;
julia> ts = TSFrame(1:10)10×1 TSFrame with Int64 Index Index x1 Int64 Int64 ────────────── 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
julia> ts.coredata10×2 DataFrame Row │ Index x1 │ Int64 Int64 ─────┼────────────── 1 │ 1 1 2 │ 2 2 3 │ 3 3 4 │ 4 4 5 │ 5 5 6 │ 6 6 7 │ 7 7 8 │ 8 8 9 │ 9 9 10 │ 10 10

The basic TSFrame constructor takes in a Vector of any type and automatically generates an index out of it (the Index column).

There are many ways to construct a TSFrame object. For real world applications you would want to read in a CSV file or download a dataset as a DataFrame and then operate on it. You can easily convert a DataFrame to a TSFrame object.

julia> using CSV, DataFrames, TSFrames, Dates
julia> dates = Date(2007, 1, 1):Day(1):Date(2008, 03, 06)Date("2007-01-01"):Dates.Day(1):Date("2008-03-06")
julia> ts = TSFrame(DataFrame(Index=dates, value=10*rand(431)))431×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-01 7.25833 2007-01-02 6.2301 2007-01-03 8.85535 2007-01-04 7.5766 2007-01-05 3.87126 2007-01-06 7.66593 2007-01-07 7.16428 2007-01-08 7.17118 ⋮ ⋮ 2008-02-29 3.01004 2008-03-01 6.8098 2008-03-02 1.11012 2008-03-03 2.68883 2008-03-04 6.20697 2008-03-05 1.90958 2008-03-06 0.563305 416 rows omitted

In the above example you generate a random DataFrame and convert it into a TSFrame object ts. The top line of the ts object tells you the number of rows (431 here) and the number of columns (1) along with the Type of Index (Dates.Date in the above example).

You can also fetch the number of rows and columns by using nr(ts), nc(ts), and size(ts) methods. Respectively, they fetch the number of rows, columns, and a Tuple of row and column numbers. A length(::TSFrame) method is also provided for convenience which returns the number of rows of it's argument.

julia> nr(ts)431
julia> nc(ts)1
julia> size(ts)(431, 1)
julia> length(ts)431

Names of data columns can be fetched using the names(ts) method which returns a Vector{String} object. The Index column can be fetched as an object of Vector type by using the index(ts) method, it can also be fetched directly using the underlying coredata property of TSFrame: ts.coredata[!, :Index].

julia> names(ts)1-element Vector{String}:
 "value"
julia> index(ts)431-element Vector{Date}: 2007-01-01 2007-01-02 2007-01-03 2007-01-04 2007-01-05 2007-01-06 2007-01-07 2007-01-08 2007-01-09 2007-01-10 ⋮ 2008-02-27 2008-02-28 2008-02-29 2008-03-01 2008-03-02 2008-03-03 2008-03-04 2008-03-05 2008-03-06

Another simpler way to read a CSV is to pass TSFrame as a sink to the CSV.read function.

julia> ts = CSV.File(filename, TSFrame)

Indexing and subsetting

One of the primary features of a timeseries package is to provide ways to index or subset a dataset using convenient interfaces. TSFrames makes it easier to index a TSFrame object by providing multiple intuitive getindex methods which work by just using the regular square parentheses([ ]).

julia> ts[1] # first row1×1 TSFrame with Date Index
 Index       value
 Date        Float64
─────────────────────
 2007-01-01  7.25833
julia> ts[[3, 5], [1]] # third & fifth row, and first column2×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-03 8.85535 2007-01-05 3.87126
julia> ts[1:10, 1] # first 10 rows and the first column as a vector10-element Vector{Float64}: 7.258332624009416 6.230103504021869 8.85535138021424 7.576598744880897 3.8712596994448347 7.66593282514858 7.164284372408087 7.171178863142268 4.792669173100709 9.2317370516512
julia> ts[1, [:value]] # using the column name1×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-01 7.25833

Apart from integer-based row indexing and integer, name based column indexing, TSFrames provides special subsetting methods for date and time types defined inside the Dates module.

julia> ts[Date(2007, 1, 10)] # on January 10, 20071×1 TSFrame with Date Index
 Index       value
 Date        Float64
─────────────────────
 2007-01-10  9.23174
julia> ts[[Date(2007, 1, 10), Date(2007, 1, 11)]] # January 10, 112×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-10 9.23174 2007-01-11 9.75278
julia> ts[Year(2007), Month(1)] # entire January 200731×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-01 7.25833 2007-01-02 6.2301 2007-01-03 8.85535 2007-01-04 7.5766 2007-01-05 3.87126 2007-01-06 7.66593 2007-01-07 7.16428 2007-01-08 7.17118 ⋮ ⋮ 2007-01-25 7.81285 2007-01-26 4.57044 2007-01-27 0.151343 2007-01-28 3.64142 2007-01-29 3.73798 2007-01-30 2.82969 2007-01-31 3.27046 16 rows omitted
julia> ts[Year(2007), Quarter(2)]91×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-04-01 6.64302 2007-04-02 3.64921 2007-04-03 5.79814 2007-04-04 7.86544 2007-04-05 8.8288 2007-04-06 7.50561 2007-04-07 4.27278 2007-04-08 9.42488 ⋮ ⋮ 2007-06-24 4.37624 2007-06-25 6.99653 2007-06-26 2.99642 2007-06-27 4.20314 2007-06-28 0.192793 2007-06-29 7.41783 2007-06-30 3.45677 76 rows omitted

Finally, one can also use the dot notation to get a column as a vector.

julia> ts.value # get the value column as a vector431-element Vector{Float64}:
 7.258332624009416
 6.230103504021869
 8.85535138021424
 7.576598744880897
 3.8712596994448347
 7.66593282514858
 7.164284372408087
 7.171178863142268
 4.792669173100709
 9.2317370516512
 ⋮
 3.55266729549483
 3.9072949930252934
 3.0100382998226047
 6.809797586357348
 1.110119654540147
 2.688834621049103
 6.206974750250688
 1.9095807288259603
 0.5633048726104051

Summary statistics

The describe() method prints summary statistics of the TSFrame object. The output is a DataFrame which includes the number of missing values, data types of columns along with computed statistical values.

julia> TSFrames.describe(ts)2×7 DataFrame
 Row │ variable  mean     min         median      max         nmissing  eltype ⋯
     │ Symbol    Union…   Any         Any         Any         Int64     DataTy ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ Index              2007-01-01  2007-08-04  2008-03-06         0  Date   ⋯
   2 │ value     5.06463  0.0203895   4.8918      9.94985            0  Float6
                                                                1 column omitted

Plotting

A TSFrame object can be plotted using the plot() function of the Plots package. The plotting functionality is provided by RecipesBase package so all the flexibility and functionality of the Plots package is available for users.

using Plots
plot(ts, size=(600,400); legend=false)

Applying a function over a period

The apply method allows you to aggregate the TSFrame object over a period type (Dates.Period(@ref)) and return the output of applying the function on each period. For example, to convert frequency of daily timeseries to monthly you may use first(), last(), or Statistics.mean() functions and the period as Dates.Month.

julia> using Statistics
julia> ts_monthly = apply(ts, Month(1), last) # convert to monthly series using the last value for each month15×1 TSFrame with Date Index Index value_last Date Float64 ──────────────────────── 2007-01-01 3.27046 2007-02-01 9.88578 2007-03-01 5.88414 2007-04-01 1.04911 2007-05-01 3.40525 2007-06-01 3.45677 2007-07-01 6.52679 2007-08-01 2.57386 2007-09-01 6.20267 2007-10-01 8.22799 2007-11-01 4.06871 2007-12-01 9.14174 2008-01-01 0.278678 2008-02-01 3.01004 2008-03-01 0.563305
julia> ts_weekly = apply(ts, Week(1), Statistics.std) # compute weekly standard deviation62×1 TSFrame with Date Index Index value_std Date Float64 ─────────────────────── 2007-01-01 1.56383 2007-01-08 3.23831 2007-01-15 2.02825 2007-01-22 2.9861 2007-01-29 3.00039 2007-02-05 3.79806 2007-02-12 0.917211 2007-02-19 2.80889 ⋮ ⋮ 2008-01-21 2.45679 2008-01-28 3.28322 2008-02-04 2.52387 2008-02-11 2.73386 2008-02-18 2.63127 2008-02-25 2.75834 2008-03-03 2.4089 47 rows omitted
julia> apply(ts, Week(1), Statistics.std, last) # same as above but index contains the last date of the week62×1 TSFrame with Date Index Index value_std Date Float64 ─────────────────────── 2007-01-07 1.56383 2007-01-14 3.23831 2007-01-21 2.02825 2007-01-28 2.9861 2007-02-04 3.00039 2007-02-11 3.79806 2007-02-18 0.917211 2007-02-25 2.80889 ⋮ ⋮ 2008-01-27 2.45679 2008-02-03 3.28322 2008-02-10 2.52387 2008-02-17 2.73386 2008-02-24 2.63127 2008-03-02 2.75834 2008-03-06 2.4089 47 rows omitted
julia> apply(ts, Week(1), Statistics.std, last, renamecols=false) # do not rename column62×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-07 1.56383 2007-01-14 3.23831 2007-01-21 2.02825 2007-01-28 2.9861 2007-02-04 3.00039 2007-02-11 3.79806 2007-02-18 0.917211 2007-02-25 2.80889 ⋮ ⋮ 2008-01-27 2.45679 2008-02-03 3.28322 2008-02-10 2.52387 2008-02-17 2.73386 2008-02-24 2.63127 2008-03-02 2.75834 2008-03-06 2.4089 47 rows omitted

Joins: Row and column binding with other objects

TSFrames provides methods to join two TSFrame objects by columns: join (alias: cbind) or by rows: vcat (alias: rbind). Both the methods provide some basic intelligence while doing the merge.

join merges two datasets based on the Index values of both objects. Depending on the join strategy employed the final object may only contain index values only from the left object (using jointype=:JoinLeft), the right object (using jointype=:JoinRight), intersection of both objects (using jointype=:JoinBoth), or a union of both objects (jointype=:JoinAll) while inserting missing values where index values are missing from any of the other object.

julia> dates = collect(Date(2007,1,1):Day(1):Date(2007,1,30));
julia> ts2 = TSFrame(rand(length(dates)), dates)30×1 TSFrame with Date Index Index x1 Date Float64 ─────────────────────── 2007-01-01 0.0284396 2007-01-02 0.333361 2007-01-03 0.821846 2007-01-04 0.498602 2007-01-05 0.86588 2007-01-06 0.100009 2007-01-07 0.718234 2007-01-08 0.264673 ⋮ ⋮ 2007-01-24 0.576912 2007-01-25 0.35489 2007-01-26 0.990442 2007-01-27 0.245079 2007-01-28 0.872574 2007-01-29 0.622879 2007-01-30 0.0832784 15 rows omitted
julia> join(ts, ts2; jointype=:JoinAll) # cbind/join on Index column431×2 TSFrame with Date Index Index value x1 Date Float64? Float64? ─────────────────────────────────────── 2007-01-01 7.25833 0.0284396 2007-01-02 6.2301 0.333361 2007-01-03 8.85535 0.821846 2007-01-04 7.5766 0.498602 2007-01-05 3.87126 0.86588 2007-01-06 7.66593 0.100009 2007-01-07 7.16428 0.718234 2007-01-08 7.17118 0.264673 ⋮ ⋮ ⋮ 2008-02-29 3.01004 missing 2008-03-01 6.8098 missing 2008-03-02 1.11012 missing 2008-03-03 2.68883 missing 2008-03-04 6.20697 missing 2008-03-05 1.90958 missing 2008-03-06 0.563305 missing 416 rows omitted

vcat also works similarly but merges two datasets by rows. This method also uses certain strategies provided via colmerge argument to check for certain conditions before doing the merge, throwing an error if the conditions are not satisfied.

colmerge can be passed setequal which merges only if both objects have same column names, orderequal which merges only if both objects have same column names and columns are in the same order, intersect merges only the columns which are common to both objects, and union which merges even if the columns differ between the two objects, the resulting object has the columns filled with missing, if necessary.

For vcat, if the values of Index are same in the two objects then all the index values along with values in other columns are kept in the resulting object. So, a vcat operation may result in duplicate Index values and the results from other operations may differ or even throw unknown errors.

julia> dates = collect(Date(2008,4,1):Day(1):Date(2008,4,30));
julia> ts3 = TSFrame(DataFrame(values=rand(length(dates)), Index=dates))30×1 TSFrame with Date Index Index values Date Float64 ────────────────────── 2008-04-01 0.808905 2008-04-02 0.504901 2008-04-03 0.16766 2008-04-04 0.118088 2008-04-05 0.125239 2008-04-06 0.884976 2008-04-07 0.768058 2008-04-08 0.856199 ⋮ ⋮ 2008-04-24 0.012924 2008-04-25 0.640004 2008-04-26 0.494065 2008-04-27 0.269375 2008-04-28 0.509111 2008-04-29 0.887845 2008-04-30 0.787493 15 rows omitted
julia> vcat(ts, ts3) # do the merge461×2 TSFrame with Date Index Index value values Date Float64? Float64? ─────────────────────────────────────────── 2007-01-01 7.25833 missing 2007-01-02 6.2301 missing 2007-01-03 8.85535 missing 2007-01-04 7.5766 missing 2007-01-05 3.87126 missing 2007-01-06 7.66593 missing 2007-01-07 7.16428 missing 2007-01-08 7.17118 missing ⋮ ⋮ ⋮ 2008-04-24 missing 0.012924 2008-04-25 missing 0.640004 2008-04-26 missing 0.494065 2008-04-27 missing 0.269375 2008-04-28 missing 0.509111 2008-04-29 missing 0.887845 2008-04-30 missing 0.787493 446 rows omitted

Rolling window operations

The rollapply applies a function over a fixed-size rolling window on the dataset. In the example below, we compute the 10-day average of dataset values on a rolling basis.

julia> rollapply(ts, mean, 10)422×1 TSFrame with Date Index
 Index       rolling_value_mean
 Date        Float64
────────────────────────────────
 2007-01-10             6.98174
 2007-01-11             7.23119
 2007-01-12             7.07755
 2007-01-13             6.35762
 2007-01-14             5.79955
 2007-01-15             5.89467
 2007-01-16             5.60043
 2007-01-17             5.64809
     ⋮               ⋮
 2008-02-29             6.3016
 2008-03-01             6.03753
 2008-03-02             5.47355
 2008-03-03             5.27348
 2008-03-04             4.98942
 2008-03-05             4.33718
 2008-03-06             3.43852
                407 rows omitted

Computing rolling difference and percent change

Similar to apply and rollapply there are specific methods to compute rolling differences and percent changes of a TSFrame object. The diff method computes mathematical difference of values in adjacent rows, inserting missing in the first row. pctchange computes the percentage change between adjacent rows.

julia> diff(ts)431×1 TSFrame with Date Index
 Index       value
 Date        Float64?
──────────────────────────────
 2007-01-01  missing
 2007-01-02       -1.02823
 2007-01-03        2.62525
 2007-01-04       -1.27875
 2007-01-05       -3.70534
 2007-01-06        3.79467
 2007-01-07       -0.501648
 2007-01-08        0.00689449
     ⋮              ⋮
 2008-02-29       -0.897257
 2008-03-01        3.79976
 2008-03-02       -5.69968
 2008-03-03        1.57871
 2008-03-04        3.51814
 2008-03-05       -4.29739
 2008-03-06       -1.34628
              416 rows omitted
julia> pctchange(ts)431×1 TSFrame with Date Index Index value Date Float64? ─────────────────────────────── 2007-01-01 missing 2007-01-02 -0.141662 2007-01-03 0.421381 2007-01-04 -0.144405 2007-01-05 -0.48905 2007-01-06 0.980217 2007-01-07 -0.0654387 2007-01-08 0.000962342 ⋮ ⋮ 2008-02-29 -0.229636 2008-03-01 1.26236 2008-03-02 -0.836982 2008-03-03 1.42211 2008-03-04 1.30843 2008-03-05 -0.692349 2008-03-06 -0.705011 416 rows omitted

Computing log of data values

julia> log.(ts)431×1 TSFrame with Date Index
 Index       value_log
 Date        Float64
───────────────────────
 2007-01-01   1.98215
 2007-01-02   1.82939
 2007-01-03   2.18102
 2007-01-04   2.02506
 2007-01-05   1.35358
 2007-01-06   2.03679
 2007-01-07   1.96911
 2007-01-08   1.97007
     ⋮           ⋮
 2008-02-29   1.10195
 2008-03-01   1.91836
 2008-03-02   0.104468
 2008-03-03   0.989108
 2008-03-04   1.82567
 2008-03-05   0.646884
 2008-03-06  -0.573934
       416 rows omitted

Creating lagged/leading series

lag() and lead() provide ways to lag or lead a series respectively by a fixed value, inserting missing where required.

julia> lag(ts, 2)431×1 TSFrame with Date Index
 Index       value
 Date        Float64?
───────────────────────────
 2007-01-01  missing
 2007-01-02  missing
 2007-01-03        7.25833
 2007-01-04        6.2301
 2007-01-05        8.85535
 2007-01-06        7.5766
 2007-01-07        3.87126
 2007-01-08        7.66593
     ⋮             ⋮
 2008-02-29        3.55267
 2008-03-01        3.90729
 2008-03-02        3.01004
 2008-03-03        6.8098
 2008-03-04        1.11012
 2008-03-05        2.68883
 2008-03-06        6.20697
           416 rows omitted
julia> lead(ts, 2)431×1 TSFrame with Date Index Index value Date Float64? ──────────────────────────── 2007-01-01 8.85535 2007-01-02 7.5766 2007-01-03 3.87126 2007-01-04 7.66593 2007-01-05 7.16428 2007-01-06 7.17118 2007-01-07 4.79267 2007-01-08 9.23174 ⋮ ⋮ 2008-02-29 1.11012 2008-03-01 2.68883 2008-03-02 6.20697 2008-03-03 1.90958 2008-03-04 0.563305 2008-03-05 missing 2008-03-06 missing 416 rows omitted

Converting to Matrix and DataFrame

You can easily convert a TSFrame object into a Matrix or fetch the DataFrame for doing operations which are outside of the TSFrames scope.

julia> ts[:, 1] # convert column 1 to a vector of floats431-element Vector{Float64}:
 7.258332624009416
 6.230103504021869
 8.85535138021424
 7.576598744880897
 3.8712596994448347
 7.66593282514858
 7.164284372408087
 7.171178863142268
 4.792669173100709
 9.2317370516512
 ⋮
 3.55266729549483
 3.9072949930252934
 3.0100382998226047
 6.809797586357348
 1.110119654540147
 2.688834621049103
 6.206974750250688
 1.9095807288259603
 0.5633048726104051
julia> Matrix(ts) # convert entire TSFrame into a Matrix431×1 Matrix{Float64}: 7.258332624009416 6.230103504021869 8.85535138021424 7.576598744880897 3.8712596994448347 7.66593282514858 7.164284372408087 7.171178863142268 4.792669173100709 9.2317370516512 ⋮ 3.55266729549483 3.9072949930252934 3.0100382998226047 6.809797586357348 1.110119654540147 2.688834621049103 6.206974750250688 1.9095807288259603 0.5633048726104051
julia> select(ts.coredata, :Index, :value, DataFrames.nrow) # use the underlying DataFrame for other operations431×3 DataFrame Row │ Index value nrow │ Date Float64 Int64 ─────┼───────────────────────────── 1 │ 2007-01-01 7.25833 431 2 │ 2007-01-02 6.2301 431 3 │ 2007-01-03 8.85535 431 4 │ 2007-01-04 7.5766 431 5 │ 2007-01-05 3.87126 431 6 │ 2007-01-06 7.66593 431 7 │ 2007-01-07 7.16428 431 8 │ 2007-01-08 7.17118 431 ⋮ │ ⋮ ⋮ ⋮ 425 │ 2008-02-29 3.01004 431 426 │ 2008-03-01 6.8098 431 427 │ 2008-03-02 1.11012 431 428 │ 2008-03-03 2.68883 431 429 │ 2008-03-04 6.20697 431 430 │ 2008-03-05 1.90958 431 431 │ 2008-03-06 0.563305 431 416 rows omitted

Writing TSFrame into a CSV file

Writing a TSFrame object into a CSV file can be done easily by using the underlying coredata property. This DataFrame can be passed to the CSV.write method for writing into a file.

julia> CSV.write("/tmp/demo_ts.csv", ts)"/tmp/demo_ts.csv"

Broadcasting

Broadcasting can be used on a TSFrame object to apply a function to a subset of it's columns.

julia> using TSFrames, DataFrames;

julia> ts = TSFrame(DataFrame(Index = [1, 2, 3, 4, 5], A = [10.1, 12.4, 42.4, 24.1, 242.5], B = [2, 4, 6, 8, 10]))
(5 x 2) TSFrame with Int64 Index

 Index  A        B     
 Int64  Float64  Int64 
───────────────────────
     1     10.1      2
     2     12.4      4
     3     42.4      6
     4     24.1      8
     5    242.5     10

julia> sin_A = sin.(ts[:, [:A]])    # get sin of column A
(5 x 1) TSFrame with Int64 Index

 Index  A_sin
 Int64  Float64
──────────────────
     1  -0.625071
     2  -0.165604
     3  -0.999934
     4  -0.858707
     5  -0.562466

julia> log_ts = log.(ts)    # take log of all columns
(5 x 2) TSFrame with Int64 Index

 Index  A_log    B_log
 Int64  Float64  Float64
──────────────────────────
     1  2.31254  0.693147
     2  2.5177   1.38629
     3  3.74715  1.79176
     4  3.18221  2.07944
     5  5.491    2.30259

julia> log_ts = log.(ts[:, [:A, :B]])   # can specify multiple columns
(5 x 2) TSFrame with Int64 Index

 Index  A_log    B_log
 Int64  Float64  Float64
──────────────────────────
     1  2.31254  0.693147
     2  2.5177   1.38629
     3  3.74715  1.79176
     4  3.18221  2.07944
     5  5.491    2.30259

Tables.jl Integration

TSFrame objects are Tables.jl compatible. This integration enables easy conversion between the TSFrame format and other formats which are Tables.jl compatible.

As an example, first consider the following code which converts a TSFrame object into a DataFrame, a TimeArray and a CSV file respectively.

julia> using TSFrames, TimeSeries, Dates, DataFrames, CSV;

julia> dates = Date(2018, 1, 1):Day(1):Date(2018, 12, 31)
Date("2018-01-01"):Day(1):Date("2018-12-31")

julia> ts = TSFrame(DataFrame(Index = dates, x1 = 1:365));

# conversion to DataFrames
julia> df = DataFrame(ts);

# conversion to TimeArray
julia> timeArray = TimeArray(ts, timestamp = :Index);

# writing to CSV
julia> CSV.write("ts.csv", ts);

Next, here is some code which converts a DataFrame, a TimeArray and a CSV file to a TSFrame object.

julia> using TSFrames, DataFrames, CSV, TimeSeries, Dates;

# converting DataFrame to TSFrame
julia> ts = TSFrame(DataFrame(Index=1:10, x1=1:10));

# converting from TimeArray to TSFrame
julia> dates = Date(2018, 1, 1):Day(1):Date(2018, 12, 31)
Date("2018-01-01"):Day(1):Date("2018-12-31")

julia> ta = TimeArray(dates, rand(length(dates)));

julia> ts = TSFrame(ta);

# converting from CSV to TSFrame
julia> CSV.read("ts.csv", TSFrame);
Note

This discussion warrants a note about how we've implemented the Tables.jl interfaces. Since TSFrame objects are nothing but a wrapper around a DataFrame, our implementations of these interfaces just call DataFrames.jl's implementations. Moreover, while constructing TSFrame objects out of other Tables.jl compatible types, our constructor first converts the input table to a DataFrame, and then converts the DataFrame to a TSFrame object.