I am trying to do a join of a data.table with itself. The condition to join is based on the value of a column (not the key) being used to access a matrix. Each row has a date(in seconds) and records should only join with newest records (t1
cn<-unique(sdd$column)
mat<-matrix(data=0,nrow=lde,ncol=lde,dimnames=list(cn,cn))
I am struggling with the documentation FAQ (including the SQL to data.table analogy) and the Beginner's Guide for data.table and multiple similar questions in this forum but I can't find how to solve it.
I get stuck with notation problems.
My DT is sdd:
> colnames(sdd)
[1] "ID" "DelayStartTimeSeconds" "DelayEndTimeSeconds" "EquipmentID"
I have made several attempts such as:
sdd2<-sdd # to avoid problems with the names of variables
sdd[sdd2,eqXrossM[cbind(sdd.EquipmentID,sdd2.EquipmentID)]==1 & sdd.DelayEndTimeSeconds<sdd2.DelayStartTimeSeconds, distance:=sdd.DelayEndTimeSeconds-sdd2.DelayStartTimeSeconds][,distance:=sdd.DelayEndTimeSeconds<sdd2.DelayStartTimeSeconds] # that would be the whole thing to do generating a new column with the time difference
sdd[sdd2[ sdd.DelayEndTimeSeconds<sdd2.DelayStartTimeSeconds, distance:=sdd.DelayEndTimeSeconds<sdd2.DelayStartTimeSeconds]] #this is an approximation attempt.
I simply don't get the notation and the different examples seem to use different notations.
EDIT: Well, after a night of sleep some stuff is making sense... but other is still confusing just the same. For example:
sdd_x<-sdd[sdd2,i.DelayStartTimeSeconds>DelayEndTimeSeconds] # returns a vector as long as sdd: len
sdd_x<-sdd[sdd2,i.DelayStartTimeSeconds>DelayEndTimeSeconds & eqXrossM[i.EquipmentID,EquipmentID]==1] # returns a matrix len x len.
Why adding a new condition changes the type of output? I was expecting a case such as the matrix (that would require optimization) In addition the whole matrix is false what is not the expected value as the records are different. In fact for the second case, either the upper or lower diagonal should be TRUE.
Also, looks like the call to the matrix doesn't require using cbind as other answer to a similar question mentioned. Why that?
And my last discovery has been finding out the CJ() operator but trying to use the i. notation doesn't work here. This part doesn't seem too documented.
sdd[CJ(DASDelayID,DASDelayID),i.DelayStartTimeSeconds>DelayEndTimeSeconds]
Any help would be appreciated.
i.
to refer to thei-expression
data.table
, e.g.dt1[dt2, col.from.dt1 := i.col.from.dt2]
– Winded1:5 * matrix(1:25, nrow = 5)
- once you understand what happens there it should at least clear out some of your confusion (also search "R recycling" on google). – Winded