I have this data:
simulated_states = c("A", "E", "B", "B", "A", "C", "D", "A", "B", "D", "A", "D",
"D", "E", "D", "D", "D", "E", "A", "A", "A", "B", "A", "C", "C",
"D", "A", "A", "D", "A", "D", "A", "A", "A", "C", "C", "D", "A",
"C", "C", "D", "E", "C", "C", "C", "E", "B", "A", "E", "E", "C",
"C", "D", "E", "C", "E", "E", "A", "E", "B", "A", "A", "E", "E",
"C", "E", "C", "C", "C", "D", "E", "D", "C", "D", "A", "B", "B",
"E", "B", "A", "E", "C", "C", "D", "B", "B", "A", "C", "B", "A",
"D", "A", "D", "E", "C", "D", "D", "A", "A", "C")
I know how to calculate transition probabilities :
calculate_transition_probs <- function(states) {
transitions <- data.frame(
from = states,
to = c(states[-1], NA)
)
transition_counts <- table(transitions, useNA = "always")
transition_df <- as.data.frame(transition_counts)
colnames(transition_df) <- c("from", "to", "count")
transition_df <- transition_df[!is.na(transition_df$to), ]
transition_df <- transition_df %>%
group_by(from) %>%
mutate(percent = count / sum(count) * 100) %>%
ungroup()
transition_df <- transition_df[, c("from", "to", "count", "percent")]
transition_df <- transition_df[order(transition_df$from, transition_df$to), ]
return(transition_df)
}
transition_probs <- calculate_transition_probs(simulated_states)
The result looks like this:
from to count percent
A A 7 26.923077
A B 3 11.538462
A C 6 23.076923
A D 5 19.230769
A E 5 19.230769
B A 7 58.333333
B B 3 25.000000
B C 0 0.000000
B D 1 8.333333
B E 1 8.333333
C A 0 0.000000
C B 1 4.545455
C C 9 40.909091
C D 9 40.909091
C E 3 13.636364
D A 9 42.857143
D B 1 4.761905
D C 1 4.761905
D D 4 19.047619
D E 6 28.571429
E A 2 11.111111
E B 4 22.222222
E C 7 38.888889
E D 2 11.111111
E E 3 16.666667
Now, I want to extend this to calculate transition probabilities for n-step probabilities.
E.g.
- 2 Steps: to = A given from = (A,A), to = A given from = (A,B), to = A given from = (A,C) ..... to = B given from = (A,B), to = B given from = (B,B) etc.
- 3 Steps: to = A given from = (A,A,A), to = A given from = (A,B,A), etc.
- N steps : to = A given from = (A,A,A...A) etc.
How can I write a function that does this for n-steps?
E.g. for 5 steps, the output should look like this:
from1 from2 from3 from4 from5 to count percent
A A A A A A 0 0
A A A A A E 0 0
A A A A A B 0 0
A A A A A C 0 0
A A A A A D 0 0
mat
, which looks very much your transition matrix (although you don't actually say this). This matrix tells you the probabilities of transitioning from one state to another. What you are simulating after that isn't clear to me and (I think) irrelevant. The 2-step transition matrix is simplymat %*% mat
, which gives the probabilities of going from each current state to other states after 2 steps. – Embryotomy