How to process panel data for use in a recurrent neural network (RNN)
Asked Answered
S

4

20

I have been doing some research on recurrent neural networks, but I am having trouble understanding if and how they could be used to analyze panel data (meaning cross-sectional data that is captured at different periods in time for several subjects -- see sample data below for example).Most examples of RNNs I have seen have to do with sequences of text, rather than true panel data, so I'm not sure if they are applicable to this type of data.

Sample data:

ID    TIME    Y    X1    X2    X3
1     1       5     3     0    10
1     2       5     2     2    6
1     3       6     6     3    11
2     1       2     2     7    2
2     2       3     3     1    19
2     3       3     8     6    1
3     1       7     0     2    0

If I want to predict Y at a particular time given the covariates X1, X2 and X3 (as well as their values in previous time periods), can this kind of sequence be evaluated by a recurrent neural network? If so, do you have any resources or ideas on how to turn this type of data into feature vectors and matching labels that can be passed to an RNN (I'm using Python, but am open to other implementations).

Sulfide answered 12/10, 2016 at 20:59 Comment(3)
Did you find any answer to your question ?Shackle
@Shackle Did you find any implementations that can be used for Panel Data?Backman
Any useful threads for this question yet?Rejoinder
P
1

I also was looking at this question and so far I've only found this paper which seems to deal with it.

Tensorial Recurrent Neural Networks for Longitudinal Data Analysis Mingyuan Bai, Boyan Zhang and Junbin Gao 2017

I hope this helps

Perlis answered 5/3, 2019 at 13:28 Comment(1)
The authors will not share their code used to produce their "empirical" results for the paper.Shaun
C
1

TSAI (based on fastai) https://timeseriesai.github.io/tsai/data.preparation.html#SlidingWindowPanel offers a panel data preprataion function which might be of use for you.

FYI: it has some great SOTA algoithms for time series classification & regression.

Classical answered 22/6, 2021 at 14:56 Comment(0)
V
0

Please, see this post.

It answers your concerning about neural networks and panel data.

Verbiage answered 6/1, 2021 at 17:12 Comment(0)
E
-3

I find no reason in being able to train neural network with panel data. What neural network does is that it maps one set of values with other set of values who have non-linear relation. In a time series a value at a particular instance depends on previous occuring values. Example: your pronunciation of a letter may vary depending on what letter you pronounced just earlier. For time series prediction Recurrent Neural Network outperforms feed-forward neural networks. How we train time series with a regular feed-forward network is illustrated in this picture. Image

In RNN we can create a feedback loop in the internal states of the network and that's why RNN is better at predicting time series. In your example data one thing to consider : do values of x1, x2, x3 have effect on y1 or vice-versa ? If it doesn't then you can train your model as such x1,x2,x3, y4 are same type of data i.e train them independently using same network (subject to experimentation). If your target is to predict a value where their values of one has effect on another i.e correlated you can convert them to one dimensional data where single time frame contains all variants of sample type. Another way might be train four neural networks where first three map their time series using RNN and last one is a feed-forward network which takes 2 inputs from 2 time series output and maps to 3rd time series output and do this for all possible combinations. (still subject to experimentation as we can't surely predict the performance of neural network model without experimenting)

Reading suggestion: Read about "Granger causality", might help you a bit.

Evanne answered 12/10, 2016 at 22:17 Comment(3)
Thanks for the response. The part I'm not sure I get is that traditional panel data is different than time series in that it has both cross-sectional and time series components. I get that you could use an RNN to predict the next value in a time series (like what will be the stock price of Google tomorrow given Google's stock price for the last 180 days -- and you might have some Granger causality elements if you have other series that have a causal relationship mixed in).Sulfide
But what about the case of true panel data above which has both subjects (the ID column) and time series (the TIME column). A scenario where you have data like this might be in a health study where you have 100 patients where you track various health measurements over time for 12 months. You would have a time series element (a patient with high blood pressure in each of the past 6 months of measurement will likely have high blood pressure in month 7) but also a cross-sectional element (patient 1 is different than patient 2 is different than patient 3, etc.).Sulfide
Can you use RNNs to predict the state of multiple variables for EACH patient for month 13 in this example, or is that not possible with the RNN architecture? (Traditionally you would use either a fixed effects or random effects regression model to do what I'm talking about, but I figured there would be modern applications of machine learning to handle this type of problem, just having a hard time finding an example).Sulfide

© 2022 - 2024 — McMap. All rights reserved.