How can I calculate the index of when the sequence starts to repeat?

3 views (last 30 days)
Lets say I have a vector: V = [1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0]. Its clear that the vector V starts repeating after the 21th element. Another eample is V = [0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1]. Its clear that the vector V starts repeating after the 7th element. Its really annoying...I have been trying this for a while but couldn't it to work.

Accepted Answer

Voss
Voss on 18 Mar 2023
Edited: Voss on 18 Mar 2023
The idea here is to successively check whether V is the same as a repeated sequence of its first T elements. Starting with T = 1, incrementing T by 1 each time, and stopping when a T is found that generates V, you end up with the smallest T that generates V.
Note that numel(V) need not be a multiple of T; elements at the end of V that are not part of a full sequence of T elements (i.e., they are "left over") are also checked as to whether they match the first however many elements of V.
For instance:
  • V = [1 2 3 4 1 2 3 4] gives T = 4 because [1 2 3 4] can be repeated twice to form V, and no smaller sequence of the first elements of V can be repeated to form V.
  • V = [1 2 3 4 1 2 3 4 1 2 3] gives T = 4 because [1 2 3 4] can be repeated "2.75 times" if you like, i.e., the "left over" [1 2 3] at the end is the same as the start of V.
  • V = [1 2 3 4 5] gives T = 5. No sequence of first elements in V smaller than V itself can be repeated to generate V.
  • V = [2 2 2] gives T = 1.
  • V = [] gives T = 0.
V = [1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0];
find_period(V)
ans = 21
V = [0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1];
find_period(V)
ans = 7
function T = find_period(V)
NV = numel(V); % number of elements in vector V
if NV == 0 % special case: V is empty
T = 0; % return T = 0 (an aribitrary convention I made up)
return
end
% try sequences until you find a sequence that generates V
% (note that T = NV will generate V eventually, if no
% smaller sequence is found first)
T = 1; % start with T = 1
while true
N = floor(NV/T); % N = number of times a sequence of T elements fits into N
if all(diff(reshape(V(1:T*N),[],N),1,2)==0,'all') ... % check that the first T*N elements of V are the same as V(1:T) repeated N times
&& isequal(V(T*N+1:NV),V(1:NV-T*N)) % check that the last "left over" elements of V are the same as the first elements of V
return % match found -> return T
end
T = T+1; % not found -> try the next T
end
end

More Answers (1)

John D'Errico
John D'Errico on 18 Mar 2023
Edited: John D'Errico on 18 Mar 2023
Why is it clear? It is not certain that it repeats at the points you have indicated.
For example, what is the repeating sequence here: [1 0 1 1 ]? Is it just the [1 1]? Or is it the entire sequence [1 0 1 1]? So lke this: [1 0 1 1 , 1 0 1 1 , 1 0 1 1 , 1 0 1 1]? Or, If I show you the sequence [1 0], is that the sequence? Is the repeating part just the [0]? Or is it just the beginning of a longer sequence?
I'll assume this is something like the problem of identifying a repetend sequence. For example, what is the binary representation of 1/19? This will be an infinitely repeating decimal. In base 2, the repeating string of bits is this 18 bit string: '000011010111100101'. So the binary representation of 1/19 is
'000011010111100101000011010111100101000011010111100101000011010111100101...'
How would I solve the problem? I solved the fully general case in base B in my code called repetend, found on the file exchange. (I'll post the link in a second.) But the point is, to start at the end.
V = [1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0];
You don't know where the repeating part starts. And in the case of a repetend, you can actually know what the possible maximum length of the reptend is, in terms of the Carmichael function of the denominator of the fraction. But if all you do is grab the last few digits off the end of the sequence, for example:
Rpossible = [0 0];
So assume the repeating sequence is of length 2. If that is the case, then
loc = strfind(V,Rpossible)
loc = 1×12
10 14 15 18 19 20 31 35 36 39 40 41
If that is the repeating part of the sequence, then we would expect to see the beginning of that string at locations 2 elements apart at the end. And we don't. That tells us to consider if the repeating string has length 3.
replen = 3;
Rpossible = V(end-replen+1:end)
Rpossible = 1×3
0 0 0
loc = strfind(V,Rpossible)
loc = 1×6
14 18 19 35 39 40
Again, at the end, we see that string appears at location 39 and 40. So the repeating string is of length 4 or more.
replen = replen + 1
replen = 4
Rpossible = V(end-replen+1:end)
Rpossible = 1×4
0 0 0 0
loc = strfind(V,Rpossible)
loc = 1×2
18 39
Ok. If there is a repeating sequence, now we know the stride. The string [0 0 0 0] happens only twice. And therefore, we now have a great guess at what the string should be. It should be the final 21 elements of V.
replen = loc(2) - loc(1)
replen = 21
Rpossible = V(end-replen+1:end)
Rpossible = 1×21
1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0
loc = strfind(V,Rpossible)
loc = 1×2
1 22
We have identified the repeating string. It is of length 21.
Again, this is the approach I would take IF I was looking for a repetend. It is necessary to start at the end, because SOME fractions have a non-repeating part. For example, in the decimal representation of the fraction 1/12,
1/12
ans = 0.0833
the repetend is '3', but there is an initial non-repeating string of digits '08'. So you always need to start looking at the end. Consider the case of identifying the repetend of the fraction 12/35, in base 10. We could grab the digits easily enough. If I was feeling lazy, I might do this:
V = char(vpa(sym(12)/35,30))
V = '0.342857142857142857142857142857'
V = V(3:end) % stripping off the '0.'
V = '342857142857142857142857142857'
We can compute carmichael(35) as 12. So I knw the repeating string is of length 12 at most, but it may be of a length of any divisor of 12 too. So the repetend would always be of one of these lengths 12,6,4,3,2,1.
replen = 12;
Rpossible = V(end-replen+1:end)
Rpossible = '142857142857'
loc = strfind(V,Rpossible)
loc = 1×3
7 13 19
Now, look at the last two locations it found that sequence. If there was a seqence repeat of length 12, that is not consistent with the last two locations found. They differ by 6. And now we know the repeat length is 6, not 12.
replen = 6;
Rpossible = V(end-replen+1:end)
Rpossible = '142857'
loc = strfind(V,Rpossible)
loc = 1×4
7 13 19 25
Finally, I can do a circshift, noting that the repeating sequence actually started at location 2 in V. Or, I can be lazy, and just use my repetend code, to learn:
[repeatstr,nonrepstr] = repetend(12,35,10)
repeatstr =
'428571'
nonrepstr =
'3'
Or, for example, the number 1/9931, written in base 3. It has a repeating string of length 3310 digits in base 3 (but 9930 decimal digits.) And, yes, you can do all of this in double precision arithmetic.
[repeatstr,nonrepstr] = repetend(1,9931,3)
repeatstr =
'0000000012221112120200100001102222102120210012100110112020221110222002001010110112000222221101101201002000112121102011012001020112202121010110002010220210000201121012001200102211202111010012000212122012021222212222201221201201102021021021210200021021212122210011002202210122002120122220011100211001220200022101121100200112210001000112020111120212220200110100122002010202211000000010222000201110020000221222121201112010120022100111121222122101100202022100100122221220221010201100100201221102210100211100211201202022001102121112000111001210101010021220011122202010100120202110112022220222211022011010221111211212012110011212020202212002201211212102101201102221002220112201021110012121001220110100212000200100111100001120221110022020102101102111212200000002122100110222011000122022202011000102101012120100002022202120220111112120020102222021121202110220020111021221212020112220120011011112100221202000100022201012020202012021010002211102020101111122100112221122212212102202122000020020110122010020111111220101211020020121121010221221201221100210211222010201201021022020120100110020022220001001121222012111021120221200020210000001202120022122102200102112211102200021120210201020001112211201121100000201011021222112002011122121011022212022020111100221101002210000120122011100020012210210111111110111202001212221111021000002120100222002220220121211202100011011022102102011100000021021012211011102001202122022011021220112120022102111010211212111101020022011012221000201002022110122212001122010011112000001011201012202121210021200212221210012001112110211001000220011001220000110202212022200101110002201202212220112111022220121220201212000101102102220011010212112022222222100011101020221222211200001201020122101221121102020011120002202212121121102220000011211210212202221101011202112102212021100201012121122202120020122220211012102210221200110201112122102220101002102010000100000210010210211202012012010120222012010101000122112200200121002201021000022111220112210020222001211011220221100122212221102021111020100020221121221002202120200112222222120002220211122022220010001010211102121022001221111010001001211220202001221221000010020012120211221220210011200121220111220110210202002211201011102221112210121212122010022111000202121221020201121102000020000112002112120011110110102101122110102020200102200210110101201210211200012200021100212011122101012210021121220102220221221111222211020011122002021201211201110100222222201001221120002112221002000202112221201212101021222202000201020021111101022021200002011010201120022021112010010102021100021022112111101220010202221222000212102020202102012122200111202021211111001221100011000100101200201002222022021121002122021111110021210112022021011012120010010210011220120110002120210212012002021021221122022000022212211010002101112011020010222020122222210201022001001200221201100111200222011020120212022211100110211011222220212112010001102202111001012112000102002021111220011212200122221021002111222022100120121111111121110202210100011112012222201021220002200020021010110201222112112001201202111222222012012100112111202210201002002112010021101022001201112120110101111212022002112100012220212202001121000102211002122111102222212110212100201010122010220100010122102211101120112212220022112210022221120200102000221211122200210200100021101112000021010020210102221211201200022112120101102'
nonrepstr =
1×0 empty char array
You can find repetend in this submission on the FEX:
That submission also includes the carmichael function (which as I recall, feels a lot like the Euler totient function.)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!