User guide
This page describes how to use the TSFrames package for timeseries data handling.
Installation
julia> using Pkg
julia> Pkg.add(url="https://github.com/xKDR/TSFrames.jl")
Constructing TSFrame objects
After installing TSFrames you need to load the package in Julia environment. Then, create a basic TSFrame
object.
julia> using TSFrames;
julia> ts = TSFrame(1:10)
10×1 TSFrame with Int64 Index Index x1 Int64 Int64 ────────────── 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
julia> ts.coredata
10×2 DataFrame Row │ Index x1 │ Int64 Int64 ─────┼────────────── 1 │ 1 1 2 │ 2 2 3 │ 3 3 4 │ 4 4 5 │ 5 5 6 │ 6 6 7 │ 7 7 8 │ 8 8 9 │ 9 9 10 │ 10 10
The basic TSFrame constructor takes in a Vector
of any type and automatically generates an index out of it (the Index
column).
There are many ways to construct a TSFrame
object. For real world applications you would want to read in a CSV file or download a dataset as a DataFrame
and then operate on it. You can easily convert a DataFrame
to a TSFrame
object.
julia> using CSV, DataFrames, TSFrames, Dates
julia> dates = Date(2007, 1, 1):Day(1):Date(2008, 03, 06)
Date("2007-01-01"):Dates.Day(1):Date("2008-03-06")
julia> ts = TSFrame(DataFrame(Index=dates, value=10*rand(431)))
431×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-01 9.73397 2007-01-02 5.84181 2007-01-03 0.65625 2007-01-04 5.86312 2007-01-05 5.51728 2007-01-06 9.5246 2007-01-07 8.78945 2007-01-08 9.94281 ⋮ ⋮ 2008-02-29 3.84641 2008-03-01 6.17761 2008-03-02 0.292995 2008-03-03 4.71666 2008-03-04 5.8359 2008-03-05 6.2736 2008-03-06 4.73226 416 rows omitted
In the above example you generate a random DataFrame
and convert it into a TSFrame
object ts
. The top line of the ts
object tells you the number of rows (431
here) and the number of columns (1
) along with the Type
of Index
(Dates.Date
in the above example).
You can also fetch the number of rows and columns by using nr(ts)
, nc(ts)
, and size(ts)
methods. Respectively, they fetch the number of rows, columns, and a Tuple
of row and column numbers. A length(::TSFrame)
method is also provided for convenience which returns the number of rows of it's argument.
julia> nr(ts)
431
julia> nc(ts)
1
julia> size(ts)
(431, 1)
julia> length(ts)
431
Names of data columns can be fetched using the names(ts)
method which returns a Vector{String}
object. The Index
column can be fetched as an object of Vector
type by using the index(ts)
method, it can also be fetched directly using the underlying coredata
property of TSFrame: ts.coredata[!, :Index]
.
julia> names(ts)
1-element Vector{String}: "value"
julia> index(ts)
431-element Vector{Date}: 2007-01-01 2007-01-02 2007-01-03 2007-01-04 2007-01-05 2007-01-06 2007-01-07 2007-01-08 2007-01-09 2007-01-10 ⋮ 2008-02-27 2008-02-28 2008-02-29 2008-03-01 2008-03-02 2008-03-03 2008-03-04 2008-03-05 2008-03-06
Another simpler way to read a CSV is to pass TSFrame
as a sink to the CSV.read
function.
julia> ts = CSV.File(filename, TSFrame)
Indexing and subsetting
One of the primary features of a timeseries package is to provide ways to index or subset a dataset using convenient interfaces. TSFrames makes it easier to index a TSFrame
object by providing multiple intuitive getindex
methods which work by just using the regular square parentheses([ ]
).
julia> ts[1] # first row
1×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-01 9.73397
julia> ts[[3, 5], [1]] # third & fifth row, and first column
2×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-03 0.65625 2007-01-05 5.51728
julia> ts[1:10, 1] # first 10 rows and the first column as a vector
10-element Vector{Float64}: 9.733968587414477 5.841805386771233 0.6562504335550223 5.86312330329454 5.517283362909282 9.52459601769881 8.789451032375915 9.942811691734047 7.657974698226847 6.626645288281524
julia> ts[1, [:value]] # using the column name
1×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-01 9.73397
Apart from integer-based row indexing and integer, name based column indexing, TSFrames provides special subsetting methods for date and time types defined inside the Dates
module.
julia> ts[Date(2007, 1, 10)] # on January 10, 2007
1×1 TSFrame with Date Index Index value Date Float64 ───────────────────── 2007-01-10 6.62665
julia> ts[[Date(2007, 1, 10), Date(2007, 1, 11)]] # January 10, 11
2×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-10 6.62665 2007-01-11 0.412182
julia> ts[Year(2007), Month(1)] # entire January 2007
31×1 TSFrame with Date Index Index value Date Float64 ─────────────────────── 2007-01-01 9.73397 2007-01-02 5.84181 2007-01-03 0.65625 2007-01-04 5.86312 2007-01-05 5.51728 2007-01-06 9.5246 2007-01-07 8.78945 2007-01-08 9.94281 ⋮ ⋮ 2007-01-25 0.566132 2007-01-26 3.73588 2007-01-27 2.78977 2007-01-28 2.45337 2007-01-29 7.94426 2007-01-30 8.3398 2007-01-31 0.0861312 16 rows omitted
julia> ts[Year(2007), Quarter(2)]
91×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-04-01 0.185652 2007-04-02 6.86856 2007-04-03 6.28615 2007-04-04 3.76365 2007-04-05 6.11929 2007-04-06 0.417649 2007-04-07 1.76431 2007-04-08 4.17653 ⋮ ⋮ 2007-06-24 3.0732 2007-06-25 7.05813 2007-06-26 9.68039 2007-06-27 2.54848 2007-06-28 8.50303 2007-06-29 0.106081 2007-06-30 8.33644 76 rows omitted
Finally, one can also use the dot notation to get a column as a vector.
julia> ts.value # get the value column as a vector
431-element Vector{Float64}: 9.733968587414477 5.841805386771233 0.6562504335550223 5.86312330329454 5.517283362909282 9.52459601769881 8.789451032375915 9.942811691734047 7.657974698226847 6.626645288281524 ⋮ 5.463912989813932 2.660594968793716 3.8464132041770838 6.17760879866357 0.2929947354114648 4.716660788666809 5.835901249618831 6.273596284451427 4.73225806397092
Summary statistics
The describe()
method prints summary statistics of the TSFrame object. The output is a DataFrame
which includes the number of missing values, data types of columns along with computed statistical values.
julia> TSFrames.describe(ts)
2×7 DataFrame Row │ variable mean min median max nmissing eltype ⋯ │ Symbol Union… Any Any Any Int64 DataTy ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ Index 2007-01-01 2007-08-04 2008-03-06 0 Date ⋯ 2 │ value 4.72885 0.0141881 4.65582 9.99528 0 Float6 1 column omitted
Plotting
A TSFrame object can be plotted using the plot()
function of the Plots
package. The plotting functionality is provided by RecipesBase
package so all the flexibility and functionality of the Plots
package is available for users.
using Plots
plot(ts, size=(600,400); legend=false)
Applying a function over a period
The apply
method allows you to aggregate the TSFrame object over a period type (Dates.Period
(@ref)) and return the output of applying the function on each period. For example, to convert frequency of daily timeseries to monthly you may use first()
, last()
, or Statistics.mean()
functions and the period as Dates.Month
.
julia> using Statistics
julia> ts_monthly = apply(ts, Month(1), last) # convert to monthly series using the last value for each month
15×1 TSFrame with Date Index Index value_last Date Float64 ──────────────────────── 2007-01-01 0.0861312 2007-02-01 7.66843 2007-03-01 1.52158 2007-04-01 8.76375 2007-05-01 0.351843 2007-06-01 8.33644 2007-07-01 3.33115 2007-08-01 1.15542 2007-09-01 6.62634 2007-10-01 6.27678 2007-11-01 9.8769 2007-12-01 9.27768 2008-01-01 9.05281 2008-02-01 3.84641 2008-03-01 4.73226
julia> ts_weekly = apply(ts, Week(1), Statistics.std) # compute weekly standard deviation
62×1 TSFrame with Date Index Index value_std Date Float64 ─────────────────────── 2007-01-01 3.18267 2007-01-08 3.96743 2007-01-15 2.75279 2007-01-22 2.46398 2007-01-29 3.29503 2007-02-05 3.42123 2007-02-12 2.97784 2007-02-19 2.66173 ⋮ ⋮ 2008-01-21 1.26202 2008-01-28 3.37782 2008-02-04 2.07292 2008-02-11 2.63171 2008-02-18 2.77352 2008-02-25 2.9748 2008-03-03 0.788581 47 rows omitted
julia> apply(ts, Week(1), Statistics.std, last) # same as above but index contains the last date of the week
62×1 TSFrame with Date Index Index value_std Date Float64 ─────────────────────── 2007-01-07 3.18267 2007-01-14 3.96743 2007-01-21 2.75279 2007-01-28 2.46398 2007-02-04 3.29503 2007-02-11 3.42123 2007-02-18 2.97784 2007-02-25 2.66173 ⋮ ⋮ 2008-01-27 1.26202 2008-02-03 3.37782 2008-02-10 2.07292 2008-02-17 2.63171 2008-02-24 2.77352 2008-03-02 2.9748 2008-03-06 0.788581 47 rows omitted
julia> apply(ts, Week(1), Statistics.std, last, renamecols=false) # do not rename column
62×1 TSFrame with Date Index Index value Date Float64 ────────────────────── 2007-01-07 3.18267 2007-01-14 3.96743 2007-01-21 2.75279 2007-01-28 2.46398 2007-02-04 3.29503 2007-02-11 3.42123 2007-02-18 2.97784 2007-02-25 2.66173 ⋮ ⋮ 2008-01-27 1.26202 2008-02-03 3.37782 2008-02-10 2.07292 2008-02-17 2.63171 2008-02-24 2.77352 2008-03-02 2.9748 2008-03-06 0.788581 47 rows omitted
Joins: Row and column binding with other objects
TSFrames provides methods to join two TSFrame objects by columns: join
(alias: cbind
) or by rows: vcat
(alias: rbind
). Both the methods provide some basic intelligence while doing the merge.
join
merges two datasets based on the Index
values of both objects. Depending on the join strategy employed the final object may only contain index values only from the left object (using jointype=:JoinLeft
), the right object (using jointype=:JoinRight
), intersection of both objects (using jointype=:JoinBoth
), or a union of both objects (jointype=:JoinAll
) while inserting missing
values where index values are missing from any of the other object.
julia> dates = collect(Date(2007,1,1):Day(1):Date(2007,1,30));
julia> ts2 = TSFrame(rand(length(dates)), dates)
30×1 TSFrame with Date Index Index x1 Date Float64 ──────────────────────── 2007-01-01 0.103587 2007-01-02 0.0660796 2007-01-03 0.211131 2007-01-04 0.850458 2007-01-05 0.157577 2007-01-06 0.555865 2007-01-07 0.5901 2007-01-08 0.732587 ⋮ ⋮ 2007-01-24 0.376363 2007-01-25 0.536 2007-01-26 0.544707 2007-01-27 0.20214 2007-01-28 0.612118 2007-01-29 0.11242 2007-01-30 0.535436 15 rows omitted
julia> join(ts, ts2; jointype=:JoinAll) # cbind/join on Index column
431×2 TSFrame with Date Index Index value x1 Date Float64? Float64? ──────────────────────────────────────── 2007-01-01 9.73397 0.103587 2007-01-02 5.84181 0.0660796 2007-01-03 0.65625 0.211131 2007-01-04 5.86312 0.850458 2007-01-05 5.51728 0.157577 2007-01-06 9.5246 0.555865 2007-01-07 8.78945 0.5901 2007-01-08 9.94281 0.732587 ⋮ ⋮ ⋮ 2008-02-29 3.84641 missing 2008-03-01 6.17761 missing 2008-03-02 0.292995 missing 2008-03-03 4.71666 missing 2008-03-04 5.8359 missing 2008-03-05 6.2736 missing 2008-03-06 4.73226 missing 416 rows omitted
vcat
also works similarly but merges two datasets by rows. This method also uses certain strategies provided via colmerge
argument to check for certain conditions before doing the merge, throwing an error if the conditions are not satisfied.
colmerge
can be passed setequal
which merges only if both objects have same column names, orderequal
which merges only if both objects have same column names and columns are in the same order, intersect
merges only the columns which are common to both objects, and union
which merges even if the columns differ between the two objects, the resulting object has the columns filled with missing
, if necessary.
For vcat
, if the values of Index
are same in the two objects then all the index values along with values in other columns are kept in the resulting object. So, a vcat
operation may result in duplicate Index
values and the results from other operations may differ or even throw unknown errors.
julia> dates = collect(Date(2008,4,1):Day(1):Date(2008,4,30));
julia> ts3 = TSFrame(DataFrame(values=rand(length(dates)), Index=dates))
30×1 TSFrame with Date Index Index values Date Float64 ────────────────────── 2008-04-01 0.630221 2008-04-02 0.921043 2008-04-03 0.653232 2008-04-04 0.457713 2008-04-05 0.607545 2008-04-06 0.254335 2008-04-07 0.126919 2008-04-08 0.70725 ⋮ ⋮ 2008-04-24 0.469532 2008-04-25 0.391875 2008-04-26 0.505043 2008-04-27 0.780166 2008-04-28 0.391329 2008-04-29 0.483267 2008-04-30 0.085829 15 rows omitted
julia> vcat(ts, ts3) # do the merge
461×2 TSFrame with Date Index Index value values Date Float64? Float64? ──────────────────────────────────────────── 2007-01-01 9.73397 missing 2007-01-02 5.84181 missing 2007-01-03 0.65625 missing 2007-01-04 5.86312 missing 2007-01-05 5.51728 missing 2007-01-06 9.5246 missing 2007-01-07 8.78945 missing 2007-01-08 9.94281 missing ⋮ ⋮ ⋮ 2008-04-24 missing 0.469532 2008-04-25 missing 0.391875 2008-04-26 missing 0.505043 2008-04-27 missing 0.780166 2008-04-28 missing 0.391329 2008-04-29 missing 0.483267 2008-04-30 missing 0.085829 446 rows omitted
Rolling window operations
The rollapply
applies a function over a fixed-size rolling window on the dataset. In the example below, we compute the 10-day average of dataset values on a rolling basis.
julia> rollapply(ts, mean, 10)
422×1 TSFrame with Date Index Index rolling_value_mean Date Float64 ──────────────────────────────── 2007-01-10 7.01539 2007-01-11 6.08321 2007-01-12 5.65751 2007-01-13 5.63156 2007-01-14 5.17772 2007-01-15 5.12919 2007-01-16 4.61536 2007-01-17 4.45436 ⋮ ⋮ 2008-02-29 5.68936 2008-03-01 5.71806 2008-03-02 4.79015 2008-03-03 4.62046 2008-03-04 4.43225 2008-03-05 4.57184 2008-03-06 4.89511 407 rows omitted
Computing rolling difference and percent change
Similar to apply
and rollapply
there are specific methods to compute rolling differences and percent changes of a TSFrame
object. The diff
method computes mathematical difference of values in adjacent rows, inserting missing
in the first row. pctchange
computes the percentage change between adjacent rows.
julia> diff(ts)
431×1 TSFrame with Date Index Index value Date Float64? ──────────────────────────── 2007-01-01 missing 2007-01-02 -3.89216 2007-01-03 -5.18555 2007-01-04 5.20687 2007-01-05 -0.34584 2007-01-06 4.00731 2007-01-07 -0.735145 2007-01-08 1.15336 ⋮ ⋮ 2008-02-29 1.18582 2008-03-01 2.3312 2008-03-02 -5.88461 2008-03-03 4.42367 2008-03-04 1.11924 2008-03-05 0.437695 2008-03-06 -1.54134 416 rows omitted
julia> pctchange(ts)
431×1 TSFrame with Date Index Index value Date Float64? ───────────────────────────── 2007-01-01 missing 2007-01-02 -0.399854 2007-01-03 -0.887663 2007-01-04 7.93428 2007-01-05 -0.0589856 2007-01-06 0.72632 2007-01-07 -0.0771838 2007-01-08 0.131221 ⋮ ⋮ 2008-02-29 0.445697 2008-03-01 0.60607 2008-03-02 -0.952571 2008-03-03 15.0981 2008-03-04 0.237295 2008-03-05 0.0750004 2008-03-06 -0.245687 416 rows omitted
Computing log of data values
julia> log.(ts)
431×1 TSFrame with Date Index Index value_log Date Float64 ─────────────────────── 2007-01-01 2.27562 2007-01-02 1.76504 2007-01-03 -0.421213 2007-01-04 1.76868 2007-01-05 1.70789 2007-01-06 2.25388 2007-01-07 2.17355 2007-01-08 2.29685 ⋮ ⋮ 2008-02-29 1.34714 2008-03-01 1.82093 2008-03-02 -1.2276 2008-03-03 1.5511 2008-03-04 1.76403 2008-03-05 1.83635 2008-03-06 1.5544 416 rows omitted
Creating lagged/leading series
lag()
and lead()
provide ways to lag or lead a series respectively by a fixed value, inserting missing
where required.
julia> lag(ts, 2)
431×1 TSFrame with Date Index Index value Date Float64? ──────────────────────────── 2007-01-01 missing 2007-01-02 missing 2007-01-03 9.73397 2007-01-04 5.84181 2007-01-05 0.65625 2007-01-06 5.86312 2007-01-07 5.51728 2007-01-08 9.5246 ⋮ ⋮ 2008-02-29 5.46391 2008-03-01 2.66059 2008-03-02 3.84641 2008-03-03 6.17761 2008-03-04 0.292995 2008-03-05 4.71666 2008-03-06 5.8359 416 rows omitted
julia> lead(ts, 2)
431×1 TSFrame with Date Index Index value Date Float64? ──────────────────────────── 2007-01-01 0.65625 2007-01-02 5.86312 2007-01-03 5.51728 2007-01-04 9.5246 2007-01-05 8.78945 2007-01-06 9.94281 2007-01-07 7.65797 2007-01-08 6.62665 ⋮ ⋮ 2008-02-29 0.292995 2008-03-01 4.71666 2008-03-02 5.8359 2008-03-03 6.2736 2008-03-04 4.73226 2008-03-05 missing 2008-03-06 missing 416 rows omitted
Converting to Matrix and DataFrame
You can easily convert a TSFrame object into a Matrix
or fetch the DataFrame
for doing operations which are outside of the TSFrames scope.
julia> ts[:, 1] # convert column 1 to a vector of floats
431-element Vector{Float64}: 9.733968587414477 5.841805386771233 0.6562504335550223 5.86312330329454 5.517283362909282 9.52459601769881 8.789451032375915 9.942811691734047 7.657974698226847 6.626645288281524 ⋮ 5.463912989813932 2.660594968793716 3.8464132041770838 6.17760879866357 0.2929947354114648 4.716660788666809 5.835901249618831 6.273596284451427 4.73225806397092
julia> Matrix(ts) # convert entire TSFrame into a Matrix
431×1 Matrix{Float64}: 9.733968587414477 5.841805386771233 0.6562504335550223 5.86312330329454 5.517283362909282 9.52459601769881 8.789451032375915 9.942811691734047 7.657974698226847 6.626645288281524 ⋮ 5.463912989813932 2.660594968793716 3.8464132041770838 6.17760879866357 0.2929947354114648 4.716660788666809 5.835901249618831 6.273596284451427 4.73225806397092
julia> select(ts.coredata, :Index, :value, DataFrames.nrow) # use the underlying DataFrame for other operations
431×3 DataFrame Row │ Index value nrow │ Date Float64 Int64 ─────┼───────────────────────────── 1 │ 2007-01-01 9.73397 431 2 │ 2007-01-02 5.84181 431 3 │ 2007-01-03 0.65625 431 4 │ 2007-01-04 5.86312 431 5 │ 2007-01-05 5.51728 431 6 │ 2007-01-06 9.5246 431 7 │ 2007-01-07 8.78945 431 8 │ 2007-01-08 9.94281 431 ⋮ │ ⋮ ⋮ ⋮ 425 │ 2008-02-29 3.84641 431 426 │ 2008-03-01 6.17761 431 427 │ 2008-03-02 0.292995 431 428 │ 2008-03-03 4.71666 431 429 │ 2008-03-04 5.8359 431 430 │ 2008-03-05 6.2736 431 431 │ 2008-03-06 4.73226 431 416 rows omitted
Writing TSFrame into a CSV file
Writing a TSFrame object into a CSV file can be done easily by using the underlying coredata
property. This DataFrame
can be passed to the CSV.write
method for writing into a file.
julia> CSV.write("/tmp/demo_ts.csv", ts)
"/tmp/demo_ts.csv"
Broadcasting
Broadcasting can be used on a TSFrame
object to apply a function to a subset of it's columns.
julia> using TSFrames, DataFrames;
julia> ts = TSFrame(DataFrame(Index = [1, 2, 3, 4, 5], A = [10.1, 12.4, 42.4, 24.1, 242.5], B = [2, 4, 6, 8, 10]))
(5 x 2) TSFrame with Int64 Index
Index A B
Int64 Float64 Int64
───────────────────────
1 10.1 2
2 12.4 4
3 42.4 6
4 24.1 8
5 242.5 10
julia> sin_A = sin.(ts[:, [:A]]) # get sin of column A
(5 x 1) TSFrame with Int64 Index
Index A_sin
Int64 Float64
──────────────────
1 -0.625071
2 -0.165604
3 -0.999934
4 -0.858707
5 -0.562466
julia> log_ts = log.(ts) # take log of all columns
(5 x 2) TSFrame with Int64 Index
Index A_log B_log
Int64 Float64 Float64
──────────────────────────
1 2.31254 0.693147
2 2.5177 1.38629
3 3.74715 1.79176
4 3.18221 2.07944
5 5.491 2.30259
julia> log_ts = log.(ts[:, [:A, :B]]) # can specify multiple columns
(5 x 2) TSFrame with Int64 Index
Index A_log B_log
Int64 Float64 Float64
──────────────────────────
1 2.31254 0.693147
2 2.5177 1.38629
3 3.74715 1.79176
4 3.18221 2.07944
5 5.491 2.30259
Tables.jl Integration
TSFrame
objects are Tables.jl compatible. This integration enables easy conversion between the TSFrame
format and other formats which are Tables.jl compatible.
As an example, first consider the following code which converts a TSFrame
object into a DataFrame
, a TimeArray
and a CSV
file respectively.
julia> using TSFrames, TimeSeries, Dates, DataFrames, CSV;
julia> dates = Date(2018, 1, 1):Day(1):Date(2018, 12, 31)
Date("2018-01-01"):Day(1):Date("2018-12-31")
julia> ts = TSFrame(DataFrame(Index = dates, x1 = 1:365));
# conversion to DataFrames
julia> df = DataFrame(ts);
# conversion to TimeArray
julia> timeArray = TimeArray(ts, timestamp = :Index);
# writing to CSV
julia> CSV.write("ts.csv", ts);
Next, here is some code which converts a DataFrame
, a TimeArray
and a CSV
file to a TSFrame
object.
julia> using TSFrames, DataFrames, CSV, TimeSeries, Dates;
# converting DataFrame to TSFrame
julia> ts = TSFrame(DataFrame(Index=1:10, x1=1:10));
# converting from TimeArray to TSFrame
julia> dates = Date(2018, 1, 1):Day(1):Date(2018, 12, 31)
Date("2018-01-01"):Day(1):Date("2018-12-31")
julia> ta = TimeArray(dates, rand(length(dates)));
julia> ts = TSFrame(ta);
# converting from CSV to TSFrame
julia> CSV.read("ts.csv", TSFrame);
This discussion warrants a note about how we've implemented the Tables.jl
interfaces. Since TSFrame
objects are nothing but a wrapper around a DataFrame
, our implementations of these interfaces just call DataFrames.jl
's implementations. Moreover, while constructing TSFrame
objects out of other Tables.jl compatible types, our constructor first converts the input table to a DataFrame
, and then converts the DataFrame
to a TSFrame
object.