You could use bootstrapping to estimate confidence intervals. MATLAB provides bootci
function in the Statistics toolbox. Here is an example:
%# generate a random cell array of 400 sequences of varying length
%# each containing indices from 1 to 5 corresponding to ACGTE
sequences = arrayfun(@(~) randi([1 5], [1 randi([500 1000])]), 1:400, ...
'UniformOutput',false)';
%# compute transition matrix from all sequences
trans = countFcn(sequences);
%# number of bootstrap samples to draw
Nboot = 1000;
%# estimate 95% confidence interval using bootstrapping
ci = bootci(Nboot, {@countFcn, sequences}, 'alpha',0.05);
ci = permute(ci, [2 3 1]);
We get:
>> trans %# 5x5 transition matrix: P_hat
trans =
0.19747 0.2019 0.19849 0.2049 0.19724
0.20068 0.19959 0.19811 0.20233 0.19928
0.19841 0.19798 0.2021 0.2012 0.20031
0.20077 0.19926 0.20084 0.19988 0.19926
0.19895 0.19915 0.19963 0.20139 0.20088
and two other similar matrices containing the lower and upper bounds of confidence intervals:
>> ci(:,:,1) %# CI lower bound
>> ci(:,:,2) %# CI upper bound
I am using the following function to compute the transition matrix from a set of sequences:
function trans = countFcn(seqs)
%# accumulate transition matrix from all sequences
trans = zeros(5,5);
for i=1:numel(seqs)
trans = trans + sparse(seqs{i}(1:end-1), seqs{i}(2:end), 1, 5,5);
end
%# normalize into proper probabilities
trans = bsxfun(@rdivide, trans, sum(trans,2));
end
As a bonus, we can use bootstrp
function to get the statistic computed from each bootstrap sample, which we use to show a histogram for each of the entries in the transition matrix:
%# compute multiple transition matrices using bootstrapping
stat = bootstrp(Nboot, @countFcn, sequences);
%# display histogram for each entry in the transition matrix
sub = reshape(1:5*5,5,5);
figure
for i=1:size(stat,2)
subplot(5,5,sub(i))
hist(stat(:,i))
end