R: merge two irregular time series
Asked Answered
S

4

17

I have two multivariate time series x and y, both covering approximately the same range in time (one starts two years before the other, but they end on the same date). Both series have missing observations in the form of empty columns next to the date column, and also in the sense that one of the series has several dates that are not found in the other, and vice versa.

I would like to create a data frame (or similar) with a column that lists all the dates found in x OR y, without duplicate dates. For each date (row), I would like to horizontally stack the observations from x next to the observations from y, with NA's filling the missing cells. Example:

>x
"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9

>y
"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34

# result I would like
"1987-01-01"   7.1    NA   3   55.3   66   45
"1987-01-02"   5.2    5    2   NA     NA   NA
"1987-01-03"   NA     NA   NA  77.3   87   34
"1987-01-06"   2.3    NA   9   NA     NA   NA

What I have tried: with the zoo package, I've tried the merge.zoo method, but this seems to just stack the two series next to each other, with the dates (as numbers, e.g. "1987-01-02" shown as 6210) from each series appearing in two separate columns.

I've sat for hours getting almost nowhere, so all help is appreciated.

EDIT: some code included below as per suggestion from Soumendra

atcoa <- read.csv(file = "ATCOA_full_adj.csv", header = TRUE)
atcob <- read.csv(file = "ATCOB_full_adj.csv", header = TRUE)
atcoa$date <- as.Date(atcoa$date)
atcob$date <- as.Date(atcob$date)

# only number of observations and the observations themselves differ 
>str(atcoa)
'data.frame':   6151 obs. of  8 variables:
 $ date        :Class 'Date'  num [1:6151] 6210 6213 6215 6216 6217 ...
 $ max         : num  4.31 4.33 4.38 4.18 4.13 4.05 4.08 4.05 4.08 4.1 ...
 $ min         : num  4.28 4.31 4.28 4.13 4.05 3.95 3.97 3.95 4 4.02 ...
 $ close       : num  4.31 4.33 4.31 4.15 4.1 3.97 4 3.97 4.08 4.02 ...
 $ avg         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ tot.vol     : int  877733 89724 889437 1927113 3050611 846525 1782774 1497998 2504466 5636999 ...
 $ turnover    : num  3762300 388900 3835900 8015900 12468100 ...
 $ transactions: int  12 9 24 17 31 26 34 35 37 33 ...

>atcoa[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1 1987-01-02  4.31  4.28    4.31    NA    877733    3762300             12

# using timeSeries package
ts.atcoa <- timeSeries::as.timeSeries(atcoa, format = "%Y-%m-%d")
ts.atcob <- timeSeries::as.timeSeries(atcob, format = "%Y-%m-%d")

>str(ts.atcoa)
Time Series:          
 Name:               object
Data Matrix:        
 Dimension:          6151 7
 Column Names:       a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Row Names:          1970-01-01 01:43:30  ...  1970-01-01 04:12:35
Positions:          
 Start:              1970-01-01 01:43:30
 End:                1970-01-01 04:12:35
With:               
 Format:             %Y-%m-%d %H:%M:%S
 FinCenter:          GMT
 Units:              a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Title:              Time Series Object
 Documentation:      Wed Aug 17 13:00:50 2011

>ts.atcoa[1:1, ]
GMT
 a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 1970-01-01 01:43:30  4.31  4.28    4.31    NA    877733    3762300             12

# The following will create an object of class "data frame" and mode "list", which contains observations for the days mutual for the two series
>ts.atco <- timeSeries::merge(atcoa, atcob)  # produces same result as base::merge, apparently
>ts.atco[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions b.max b.min b.close b.avg b.tot.vol b.turnover b.transactions
1 1989-08-25  7.92  7.77    7.79    NA    269172    2119400             19  7.69  7.56    7.64    NA  81176693  593858000             12

EDIT: problem solved by (using zoo package)

atcoa <- read.zoo(read.csv(file = "ATCOA_full_adj.csv", header = TRUE))
atcob <- read.zoo(read.csv(file = "ATCOB_full_adj.csv", header = TRUE))

names(atcoa) <- c("a.max", "a.min", "a.close",
                   "a.avg", "a.tot.vol", "a.turnover", "a.transactions")
names(atcob) <- c("b.max", "b.min", "b.close",
                   "b.avg", "b.tot.vol", "b.turnover", "b.transactions")

atco <- merge.zoo(atcoa, atcob)

Thank you all for your help.

Stipendiary answered 17/8, 2011 at 7:52 Comment(2)
You didn't show what you actually tried but merge.zoo certainly does not just stack zoo objects. There is a help page on merge.zoo which includes documentation and many examples: ?merge.zoo . Also there are 5 vignettes (pdf documents) that come with zoo that give additional documentation and examples. See my post below for more info.Build
Suggest you do it like this: nms <- c("", "max", "min", "close", "avg", "tot.vol", "turnover", "transactions"); atcoa <- read.zoo("ATCOA_full_adj.dat", header = TRUE, col.names = paste("a", nms, sep = ".")) . Also since this does not appear to be a csv file, i.e. the fields are not separated by commas, it would be better not to use csv as the extension when naming the file.Build
B
11

Try this:

Lines.x <- '"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9'

Lines.y <- '"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34'

library(zoo)
# in reality x might be in a file and might be read via: x <- read.zoo("x.dat")
# ditto for y. See ?read.zoo and the zoo-read vignette if you need other args too
x <- read.zoo(text = Lines.x)
y <- read.zoo(text = Lines.y)
merge(x,  y)

giving:

           V2.x V3.x V4.x V2.y V3.y V4.y
1987-01-01  7.1   NA    3 55.3   66   45
1987-01-02  5.2    5    2   NA   NA   NA
1987-01-03   NA   NA   NA 77.3   87   34
1987-01-06  2.3   NA    9   NA   NA   NA
Build answered 17/8, 2011 at 8:44 Comment(4)
For Lines.x and Lines.y, do you mean that these should be entered manually? The datasets are too large for that...Vaginitis
I think the comments of the answer above suggest that you use read.zoo to read in your files. x <- read.zoo("ATCOA_full_adj.csv") y <- read.zoo("ATCOB_full_adj.csv") merge(x,y)Lashandralashar
Ah, you are right. This solves the problem; I will update the post original post.Vaginitis
If you are using read.zoo you do not need to use read.csv as well as they both use read.table underneath. You can just write read.zoo("ATCOA_full_adj.csv", header=T). HTHLashandralashar
L
3

You can create a timeSeries (timeSeries library) object from your dates, merge them (timeSeries default merge behaviour is different from zoo and xts and does exactly what you are asking for) and then make zoo/xts objects out of the result in case you don't want to stay with timeSeries.

One quick way to test is the following, assuming you have two zoo objects zz1 and zz2 -

library(timeSeries)
as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2)))

Compare the output of the above command with

merge(zz1, zz2)

You can also cbind -

cbind(zz1, zz2)

provided there are no shared columns with same names. Even if such column are there, you can choose the columns by which you cbind, and you will get a zoo object.

cbind(zz1[, 1:2], zz2[, 2:3]) #Assuming other columns are common
Leandraleandre answered 17/8, 2011 at 8:19 Comment(3)
Thank you. When I try your approach, I notice the following: when I use as.timeSeries on a data frame with dates in the format "%Y-%m-%d", the timeSeries object created will have the date format "%Y-%m-%d %H:%M:%S"; that I specify format="%Y-%m-%d" in the function seems to be ignored. If I then merge (using either base::merge or timeSeries::merge) two such timeSeries objects, I get a data frame with dates in the format "%Y-%m-%d", that contains only the dates that are mutual to both objects. However, the columns contain the information I want (like in the example). Any way around this?Vaginitis
Code has been included in the original post.Vaginitis
what is the difference between: as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2))) and merge(as.timeSeries(zz1), as.timeSeries(zz2)) ? I like your answer.Unlade
C
2

here, i found a more generic aproach from stat.ethz.ch

a <- ts(1:10, start=c(2014,6), frequency=12)
b <- ts(1:12, start=c(2015,1), frequency=12)

library(zoo)
m <- merge(a = as.zoo(a), b = as.zoo(b))

to get a ts object back:

as.ts(m)
Carolyn answered 30/3, 2016 at 8:22 Comment(1)
is "m" at the end of 5th row a typo?Perversity
I
1

How about this:

## Generate unique sorted time values.
i <- sort(unique(c(index(x), index(y))))

## Empty data matrix.
v <- matrix(nrow=length(i), ncol=6, NA)

## Pull in data items.
v[match(index(x), i), 1:3] <- coredata(x)
v[match(index(y), i), 4:6] <- coredata(y)

## Build new zoo object.
d <- zoo(v, order.by=i)
Izolaiztaccihuatl answered 17/8, 2011 at 8:11 Comment(1)
You may be onto something here, with some modifications: if I skip the index(x) and index(y) in the assignment to i, I will get a vector of the unique dates (if I just use the indices, it will not work since they are unrelated between the two objects - I must compare the dates instead). I can use this to fill the first column of a matrix like you suggest. However, I'm not sure if the "Pull in data items" stage still works: instead I should possibly compare the date column of x and y to the first column of the matrix, and assign elements if there is a match in either.Vaginitis

© 2022 - 2024 — McMap. All rights reserved.