How to calculate the conditional probability of an event?

40 views (last 30 days)
I have an array similar to this array = [A A B A C A B B B C C A A C]. I want to calculate p(C|A), p(C|B), p(C|C). How can I do this just having this information? I want to know what is the probability of C happening after a previous event A, B or C.

Accepted Answer

William on 23 Apr 2021
Hello Myriam. It is not clear whether A, B, and C here are text characters, or if they have numeric values. If we assume they have numerical values, like A=1, B=2 and C=3, then you could use
y = diff(array);
P_AC = sum(y==2);
P_BC = sum(y==1);
P_CC = sum(y==0);
  1 Comment
Myriam Moss
Myriam Moss on 24 Apr 2021
Edited: Myriam Moss on 24 Apr 2021
Hi William. Thank you. They are characters.
I'm new to matlab sorry. Could you explain to me your logic, please?

Sign in to comment.

More Answers (2)

William on 25 Apr 2021
Actually, I believe that p(C|A) would be:
y = strfind(array, 'A');
N_A = length(y);
p_CA = N_AC/N_A;
There is one further thing to consider, though. It may be true that the very last element of array is an 'A'. I don't think this should be counted in N_A because we don't know whether it would have been followed by a 'C' or not. So, if the last element of array is 'A', we should reduce N_A by 1.
y = strfind(array, 'A');
N_A = length(y);
if y(end)==length(array) || y(end)==length(array)-1 % The string might end
N_A = N_A - 1; % with an 'A' or an 'A '
p_CA = N_AC/N_A;

William on 25 Apr 2021
If A, B and C were variables with the values 1, 2 and 3, then in your example:
array = [1, 1, 2, 1, 3, 1, 2, 2, 2, 3, 3, 1, 1, 3]
The diff() function returns the difference between each value and the next value, so
diff(array) = [0, 1, -2, 2, -2, 1, 0, 0, 1, 0, -2, 0, 2]
Every time an A is followed by a C, the difference is 2. Every time a B is followed by a C, the difference is 1. So, I was suggesting that you count the number of times A is followed by C by counting the number of times that the value 2 appears in diff(array) with a statement like c = sum(diff(array) == 2). Unfortunately, I see now that this does not work correctly for the number of times B is followed by C, because this results in a value of 1 in diff(array), and a value of 1 is also produced when an A is followed by a B.
Since you have said that A, B and C are characters, I assume that you mean that:
array = 'A A B A C A B B B C C A A C';
In this case, maybe a better solution would be:
y = strfind(array, 'A C');
N_AC = length(y);
y = strfind(array, 'B C');
N_BC = length(y);
  1 Comment
Myriam Moss
Myriam Moss on 25 Apr 2021
Thank you William! Now I have the number of times C appears after A and B.
If I define
y = strfind(array, 'C');
N_C = length(y);
If I want p(C|A), for example, I should do:
p_CA = N_AC/N_C , do you agree? :)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!