Results for
In a previous discussion,
we looked at a variety of infallible tests for primality, but all of them were too slow to be viable for large numbers. In fact, all of the methods discussed there will fail miserably for even moderately large numbers, those with just a few dozen decimal digits. That does not even begin to push into the realm of large numbers. In turn, that forces us into a different realm of tests - tests which are usually and even almost always correct, but can sometimes incorrectly predict primality.
In this discussion, I will be trying to convince you that the Fermat test for primality can be a quite good test when the number is sufficiently large. Except of course, when it is really bad. Even so, the Fermat test for primality is both a useful tool as well as a necessary underpinning for several other better tests.
The Fermat test for primality relies on Fermat's little theorem, a perhaps under-appreciated tool. Even the name implies it is of little interest. I'm not taking about the famous last theorem, only proven in recent years, but his little theorem.
If you want to look over some nice ways to prove the little theorem, take a read in this link:
I will readily admit that long ago, when I learned about the little theorem in a nearly forgotten class, I thought it was interesting, but why would I care? Not until I learned more mathematics and saw Fermat’s little theorem appearing in different places did I begin to appreciate it. Fermat tells us that, IF P is a prime, AND w is co-prime with P (so the two are relatively prime, sharing no common factors except 1), then it must be true that
mod(w^(P-1),P) == 1
Try it out. Does it work? Be careful though as too large of an exponent will cause problems in double precision, and that is not difficult to do. As a test case that will not overwhelm doubles, note that 13 is prime, and 3 shares no common factors with 13, so we satisfy the requirements for Fermat's little theorem.
mod(3^12,13)
We can even verify that any co-prime of 13 will yield the same result.
mod((1:12).^12,13)
Indeed it worked, suggesting what we knew all along, that 13 is prime. The little Fermat test for primality of the number P uses a converse form of Fermat's little theorem, thus given a co-prime number w known as the witness, is that if
mod(w^(P-1),P)==1
then we have evidence that P is indeed prime. This is not conclusive evidence, but still it is evidence. It is not conclusive because the converse of a true statement is not always true.
The analogy I like here is if we lived in a universe where all crows are black. (I'll ask you to pretend this is true. In fact, some crows have a mutation, making them essentially albino crows. For the purposes of this thought experiment, pretend this cannot happen.) Now, suppose I show you a picture of a bird which happens to be black. Do you know the bird to be a crow? Of course not, as the bird may be a raven, or a redwing blackbird (until you see the splash of red on the wing), a common grackle, a European starling in summer plumage, a condor, etc. But having seen black plumage, it is now more likely the bird is indeed a crow. I would call this a moderately weak evidentiary test for crow-ness.
Little Fermat may seem to be of little value when testing for primality for two reasons. First, computing the remainder would seem to be highly CPU intensive for large P. In the example above, I had only to compute 3^12=531441, which is not that large. But for numbers with many thousands or millions of digits, directly raising even a number as small as 2 to that power will overwhelm any computer. Secondly, if we do that computation, little Fermat does not conclusively prove P to be prime.
Our savior in one respect is the powermod tool. And that helps greatly, since we can compute the remainder in a reasonable time. A powermod call is quite fast even for huge powers. (I won't get into how powermod works here, since that alone is probably worth a discussion. I could though, if I see some interest because there are some very pretty variations of the powermod algorithm. I hope to show you one of them when I discuss the Fibonacci test for primality in a future post.) Trying the little Fermat test using powermod on a number with 1207 decimal digits, I’ll first do a time check.
P = 4000*sym(2)^3999 - 1;
timeit(@() powermod(2,P-1,P))
As you can see, powermod really is pretty fast. Compared to an isprime test on that number it would show a significant difference.
I have said before that little Fermat is not a conclusive test. In fact, a good trick is to perform a second little Fermat test, using a different witness. If the second test also indicates primality, then we have additional evidence that P is in fact prime.
w = [2 3]; % Two parallel witnesses
powermod(w,P-1,P)
This value for P is indeed prime, and little Fermat suggests it is, doubly suggestive in that test since I actually performed two parallel tests. Here however, we need to understand when it will fail, and how often it will fail.
If we perform a little Fermat test for primality, we will never see false negatives, that is, if any test with any witness ever indicates a number is composite, then it is certainly composite. (The contrapositive of a true statement is always true.) The alternate class of failure is the false positive, where little Fermat indicates a number is prime when it was actually composite.
If P is composite, and w co-prime with P, we call P a Fermat pseudo-prime for the witness w if we see a remainder of 1 when P was in fact composite. When that happens, the witness (w) is called a Fermat liar for the primality of P. (A list of some Fermat pseudo-primes where 2 is a Fermat liar can be found in sequence A001567 of the OEIS.)
In the case of 4000*2^3999-1, I claimed the number to be in fact prime, and it was identified so (as PROBABLY prime by little Fermat. Next, consider another number from that same family. I’ll perform three parallel tests on it, with witnesses 2, 3, and 5. This will suggest the value of doing parallel tests on a number to reduce the failure rate from little Fermat.
P2 = 1024*sym(2)^1023 - 1;
w = [2; 3; 5];
gcd(w,P2)
F2 = powermod(w,P2-1,P2)
logical(F2 == 1)
As you can see, P2 is co-prime with all of 2, 3 and 5, but 2 is a Fermat liar, whilst 3 and 5 are Fermat truth tellers, identifying P2 as certainly composite. So the little Fermat test can definitely fail for SOME witnesses, since we see a disagreement. However, an interesting fact about P2 above is it is also a Mersenne number with prime exponent, thus it can be written as 2^1033-1, where 1033 is prime. I can go into more detail about this case later, but we can show that 2 is always a Fermat liar for composite Mersenne numbers when the exponent is itself prime. I’ll try to leave more detail on this matter in a future discussion, or perhaps a comment.
Next, consider the composite integer 51=3*17. As the product of two primes, it is clearly not itself prime.
P51 = 51;
w0 = 2:P51-2;
w = w0(gcd(w0,P51) == 1)
Note that I did not include 1 or 50 in that set, since 1 raised to any power is 1, and 50 is congruent to -1, mod 51. -1 raised to any even power is also always 1, and 51-1 is an even number. And so when we are working modulo 51, both 1 and 50 are not useful witnesses in terms of the little Fermat test.
R = powermod(w,P-1,P)
w(R == 1)
This teaches us that when 51 is tested for primality using the little Fermat test, there are 2 distinct witnesses w (16 and 35) we could have chosen which would have been Fermat liars, but all 28 other potential co-prime witnesses would have been truth tellers, accurately showing 51 to be composite. Proportionally, little Fermat will have been correct roughly 93% of the time, since only 2 of these 30 possible tests returned a false positive. (I’ll add that for any modulus P, if w<P is not co-prime with the modulus, then the computation mod(w^(P-1),P) will always return a non-unit result, and therefore we can theoretically use any integer w from the set 2:P-2 as a witness. However if w is co-prime with P then P is clearly not prime, and the entire problem becomes a little less interesting. As such, I will only consider co-prime witnesses for this discussion.) Regardless, that would make the little Fermat test for P=51 even more often correct, since it returns the correct result of composite for 46 out of the 48 possible witnesses 2:49. Does this mean Little Fermat is indeed the basis for a good test to rely on to learn if a number is prime? Well, yes. And no.
Little Fermat forms a very good test most of the time, but reliance is a strong word. This means we need to explore the little Fermat test in more depth, focusing on Fermat liars and the case of false positives. To offer some appreciation of the false positive rate, offline, I have tested all composite integers between 4 and 10000, for all their possible co-prime witnesses.
load FermatLiarsData
In that .mat file, I've saved three vectors, X, witnessCount, and liarCount. X is there just to use for plotting purposes and is NaN for all non-composite entries.
whos X witnessCount liarCount
The vector witnessCount is the number of valid witnesses for the corresponding number in X. Corresponding to that is the vector liarCount, which is the number of Fermat liars I found for each composite in X.
How many Fermat test useful witnesses are there for any integer X? This is just 2 less than the number of coprimes of X. The number of coprimes is given by the Euler totient function, commonly called phi(X). (I’ll be going into more depth on the totient in the next chapter of this series, because the Euler totient is a crucial part of understanding how all of this works.)
The witness count is phi(X)-2. Why subtract 2? 1 can never be a witness, but 1 is technically coprime to everything. The same applies to X-1 (which is congruent to -1 mod X.) As such, there are phi(X)-2 coprimes to consider. (I've posted a function called totient on the FEX, but it is easily computed if you know the factorization of X. Or for small numbers, you can just use GCD to identify all co-primes, and count them.)
plot(X,witnessCount,'.')
From that plot, you can learn a few interesting things. (As a mathematician, this is what I love the most, thus to look at whay may be the simplest, most boring plot, and try to find something of value, something I had never thought of before.) For example, we know that when X is prime, then everything from the set 2:X-2 is a valid witness. So the upper boundary on that plot will be the line y==x. As well, there are a few numbers where the order of the set of witnesses will be close to the maximum possible. For example, 961=31*31, has 928 valid witnesses. That makes some sense, as 961 is the square of a prime (31), so we know 961 is divisible only by 31. Only multiples of 31 will not be coprime with 961.
But how about the lower boundary? The least number of valid witnesses will always come from highly composite numbers, because they will share common factors with almost everything. For example 30 = 2*3*5, or 210=2*3*5*7.
witnessCount([30 210 420])
A good discussion about the lower bound for that plot can be found here:
What really matters to us though, is the fraction of the useful witnesses for a little Fermat test that yield a false positive.
plot(X,liarCount./witnessCount,'b.')
Cind = liarCount == witnessCount;
hold on
plot(X(Cind),1,'ro')
ylabel('Liar fraction')
xlabel('X')
title('Fermat pseudo-prime strength')
hold off
A look at this plot shows seven circles in red, corresponding to X from the list {561, 1105, 1729, 2465, 2821, 6601, 8911} which are collectively known as Carmichael numbers. These are numbers where all witnesses return a false positive. Carmichael numbers are themselves fairly rare. You can find a list of them as sequence A002997 in the OEIS. And for those of you who have never wandered around the OEIS, please take this opportunity to do so now. The OEIS stands for Online Encyclopedia of Integer Sequences. It contains a wealth of interesting knowledge about integers and integer sequences.)
There are a few other interesting numbers we can find in that plot, like 91 and 703, where roughly 50% of the valid witnesses yield false positives. Of the complete set, which numbers did return at least a 25% false positive rate for primality? These numbers would be known as strong pseudo-primes for the little Fermat test, because they are pseudo-primes for at least 25% of the potential witnesses. These strong pseudo-primes have some interesting similarities to the Carmichael numbers. (My next post will go into more depth on Carmichael numbers and strong pseudo-primes. At the moment, I am merely interested in looking at the how often the little Fermat test fails overall.)
find(liarCount./witnessCount> 0.25)
You should notice the spacing between successive strong Fermat pseudo-primes is growing slowly, with a spacing of roughly 800 on average in the vicinity of 10000. If I step out beyond by a factor of 10, the next strong Fermat pseudo-primes after 1e5 are {101101, 104653, 107185, 109061, 111361, 114589, 115921 126217, 126673}, so that average spacing is definitely growing.
Given that set, now we can look at the prime factorizations of each of those strong pseudo-primes. Can we learn something about them?
arrayfun(@factor,find(liarCount./witnessCount> 0.25),'UniformOutput',false)
Perhaps the most glaring thing I see in that set of factors is almost all of those strong Fermat pseudo-primes are square free. That is, in that list, only 45=3*3*5 had any replicated factor at all. That property of being square free is something we will see is necessary to be a Carmichael number, but it also suggests that a simple roughness test applied in advance would have eliminated almost all of those strong pseudo-primes as obviously not prime, even at a very low level of roughness.
In fact, for most composite integers, most witnesses do indeed return a negative, indicating the number is not prime, and therefore composite. Little Fermat does not commonly tell falsehoods, even though it can do so.
semilogy(X,movmedian(liarCount./witnessCount,20,'omitnan'),'b-')
title('False positive fraction for composites')
yline(0.0003,'r')
We can learn from this last plot that as the number to be tested grows large, the median false positive rate for little Fermat, even for X as low as only 10000, is roughly 0.0003. (It continues to decrease for larger X too. In fact, I’ve read that when X is on the order of 2^256, the relative fraction of Fermat liars is on the order of 1 in 1e12, and it continues to decrease as X grows in magnitude. In my eyes, that seems pretty good for an imperfect test. Not perfect, but not bad when paired with roughness and perhaps a second little Fermat test using a different witness, and we will start to see tests which bear a higher degree of strength.)
I’ll stop at this point in this post because the post is getting lengthy. In my next post, I’d like to visit some questions about what are Carmichael numbers, about whether some witnesses are better than others, and if there are any numbers which lack any Fermat liars. However, in order to dive more deeply, I will need to explain how/why/when the little Fermat test works, and what causes Fermat liars. Stay tuned, because this starts to get interesting.
An emirp is a prime that is prime when viewed in in both directions. They are not too difficult to find at a lower level. For example...
isprime([199 991])
Gosh, that was easy. But what happens if the number is a bit larger? The problem is, primes themselves tend to be rare on the number line when you get into thousands or tens of thousands of decimal digits. And recently, I read that a world record size prime had been found in this form. You have probably all heard of Matt Parker and numberphile.
And so, I decided that MATLAB would be capable of doing better. Why not? After all, at the time, the record size emirp had only 10002 decimal digits.
How would I solve this problem? First, we can very simply write a potential emirp as
10^n + a
then we can form the flipped version as
ahat*10^(n-d) + 1
where ahat is the decimally flipped version of a, and d is chosen based on the number of decimal digits in the number a itself. Not all emirps will be of that form of course, but using all of those powers of 10 makes it easy to construct a large number and its reversed form. And that is a huge benefit in this. For example,
Pfor = sym(10)^101 + 943
Prev = 349*sym(10)^99 + 1
It is easier to view these numbers using a little code I wrote, one that redacts most of those boring zeros.
emirpdisplay(Pfor)
emirpdisplay(Prev)
And yes, they are both prime, and they both have 102 decimal digits.
isprime([Pfor,Prev])
Sadly, even numbers that large are very small potatoes, at least in the world of large primes. So how do we solve for a much larger prime pair using MATLAB?
The first thing I want to do is to employ roughness at a high level. If a number is prime, then it is maximally rough. (I posted a few discussions about roughness some time ago.)
https://www.mathworks.com/matlabcentral/discussions/tips/879745-primes-and-rough-numbers-basic-ideas
In this case, I'm going to look for serious roughness, thus 2e9-rough numbers. Again, a number is k-rough if its smallest prime factor is greater than k. There are roughly 98 million primes below 2e9.
The general idea is to compute the remainders of 10^12345, modulo every prime in that set of primes below 2e9. This MUST be done using int64 or uint64 arithmetic, as doubles will start to fail you above
format short g
sqrt(flintmax)
The sqrt is in there because we will be multiplying numbers together here, and we need always to stay below intmax for the integer format you are working with. However, if we work in an integer format, we can get as high as 2e9 easily enough, by going to int64 or uint64.
sqrt(double(intmax('int64')))
And, yes, this means I could have gone as high as primes(3e9), however, I stopped at 2e9 due to the amount of RAM on my computer. 98 million primes seemed enough for this task. And even then, I found myself working with all of the cores on my computer. (Note that I found int64 arithmetic will only fire up the performance cores on your Mac via automatic multi-threading. Mine has 12 performance cores, even though it has 16 total cores.)
I computed the remainders of 10^12345 with respect to each prime in that set using a variation of the powermod algorithm. (Not powermod itself, which was itself not sufficiently fast for my purposes.) Once I had those 98 millin remainders in a vector, then it became easy to use a variation of the sieve of Eratosthenes to identify 2e9-rough numbers.
For example, working at 101 decimal digits, if I search for primes of the form 10^101+a, with a in the interval [1,10000], there are 256 numbers of that form which are 2e9-rough. Roughness is a HUGE benefit, since as you can see here, I would not want to test for primality all 10000 possible integers from that interval.
Next, I flip those 256 rough numbers into their mirror image form. Which members of that set are also rough in the mirror image form? We would then see this further reduces the set to only 34 candidates we need test for primality which were rough in both directions. With now only a few direct tests for primality, we would find that pair of 102 digit primes shown above.
Of course, I'm still needing to work with primes in the regime of 10000 plus decimal digits, and that means I need to be smarter about how I test a number to be prime. The isprime test given by sym/isprime only survives out to around 1000 decimal digits before it starts to get too slow. That means I need to perform Fermat tests to screen numbers for primality. If that indicates potential primality, I currently use a Miller-Rabin code to verify that result, one based on the tool Java.Math.BigInteger.isProbablePrime.
And since Wikipedia tells me the current world record known emirp was
117,954,861 * 10^11111 + 1 discovered by Mykola Kamenyuk
that tells me I need to look further out yet. I chose an exponent of 12345, so starting at 10^12345. Last night I set my Mac to work, with all cores a-fumbling, a-rumbling at the task as I slept. Around 4 am this morning, it found this number:
emirp = @(N,a) sym(10)^N + a;
Pfor = emirp(12345,10519197);
Prev = sym(flip(char(Pfor)));
emirpdisplay(Pfor)
emirpdisplay(Prev)
isProbablePrimeFLT([Pfor,Prev],210)
I'm afraid you will need to take my word for it that both also satisfy a more robust test of primality, as even a Miller-Rabin test that will take more time than the MATLAB version we get for use in a discussion will allow. As far as a better test in the form of the MATLAB isprime utility to verify true primality, that test is still running on my computer. I'll check back in a few hours to see if it fininshed.
Anyway, the above numbers now form the new world record known emirp pair, at 12346 decimal digits. Yes, I do recognize this is still what I would call low hanging fruit, that having announced a largest prime of this form, someone else willl find one yet larger in a few weeks or months. But even so, for the moment, MATLAB owns the world record!
If anyone else wants a version of the codes I used for the search, I've attached a version (emirpsearchpar.m) that employs the parallel processing toolbox. I do have as well a serial version which is of course, much, much slower. It would be fun to crowd source a larger world record yet from the MATLAB community.
The GCD approach to identify rough numbers is a terribly useful one, well worth remembering. But at some point, I expect someone to notice that all work done with these massively large symbolic numbers uses only one of the cores on your computer. And, having spent so much money on those extra cores in your CPU, surely we can find a way to use them all? The problem is, computations done on symbolic integers never use more than 1 core. (Sad, unhappy face.)
In order to use all of the power available to your computer using MATLAB, you need to work in double precision, or perhaps int64 or uint64. To do that, I'll next search for primes among the family 3^n+4. In fact, they seem pretty common, at least if we look at the first few such examples.
F = @(n) sym(3).^n + 4;
F(0:16)
ans =
[5, 7, 13, 31, 85, 247, 733, 2191, 6565, 19687, 59053, 177151, 531445, 1594327, 4782973, 14348911, 43046725]
isprime(F(0:16))
ans =
1×17 logical array
1 1 1 1 0 0 1 0 0 1 1 0 0 0 0 0 0
Of the first 11 members of that sequence, 7 of them were prime. Naturally, primes will become less frequent in this sequence as we look further out. The members of this family grow rapidly in size. F(10000) has 4771 decimal digits, and F(100000) has 47712 decimal digits. We certainly don't want to directly test every member of that sequence for primality. However, what I will call a partial or incomplete sieve can greatly decrease the work needed.
Consider there are roughly 5.7 million primes less than 1e8.
numel(primes(1e8))
ans =
5761455
F(17) is the first member of our sequence that exceeds 1e8. So we can start there, since we already know the small-ish primes in this sequence.
roughlim = 1e8;
primes1e8 = primes(roughlim);
primes1e8([1 2]) = []; % F(n) is never divisible by 2 or 3
F_17 = double(F(17));
Fremainders = mod(F_17,primes1e8);
nmax = 100000;
FnIsRough = false(1,nmax);
for n = 17:nmax
if all(Fremainders)
FnIsRough(n) = true;
end
% update the remainders for the next term in the sequence
% This uses the recursion: F(n+1) = 3*F(n) - 8
Fremainders = mod(Fremainders*3 - 8,primes1e8);
end
sum(FnIsRough)
ans =
6876
These will be effectively trial divides, even though we use mod for the purpose. The result is 6876 1e8-rough numbers, far less than that total set of 99984 values for n. One thing of great importance is to recognize this sequence of tests will use an approximately constant time per test regardless of the size of the numbers because each test works off the remainders from the previous one. And that works as long as we can update those remainders in some simple, direct, and efficient fashion. All that matters is the size of the set of primes to test against. Remember, the beauty of this scheme is that while I did what are implicitly trial divides against 5.76 million primes at each step, ALL of the work was done in double precision. That means I used all 8 of the cores on my computer, pushing them as hard as I could. I never had to go into the realm of big integer arithmetic to identify the rough members in that sequence, and by staying in the realm of doubles, MATLAB will automatically use all the cores you have available.
The first 10 values of n (where n is at least 17), such that F(n) is 1e8-rough were
FnIsRough = find(FnIsRough);
FnIsRough(1:10)
ans =
22 30 42 57 87 94 166 174 195 198
How well does the roughness test do to eliminate composite members of this sequence?
isprime(F(FnIsRough(1:10)))
ans =
1×10 logical array
1 1 1 1 1 0 0 1 1 1
As you can see, 8 of those first few 1e8-rough members were actually prime, so only 2 of those eventual isprime tests were effectively wasted. That means the roughness test was quite useful indeed as an efficient but relatively weak pre-test for possible primality. More importantly it is a way to quickly eliminate those values which can be known to be composite.
You can apply a similar set of tests on many families of numbers. For example, repunit primes are a great case. A rep-digit number is any number composed of a sequence of only a single digit, like 11, 777, and 9999999999999.
However, you should understand that only rep-digit numbers composed of entirely ones can ever be prime. Naturally, any number composed entirely of the digit D, will always be divisible by the single digit number D, and so only rep-unit numbers can be prime. Repunit numbers are a subset of the rep-digit family, so numbers composed only of a string of ones. 11 is the first such repunit prime. We can write them in MATLAB as a simple expression:
RU = @(N) (sym(10).^N - 1)/9;
RU(N) is a number composed only of the digit 1, with N decimal digits. This family also follows a recurrence relation, and so we could use a similar scheme as was used to find rough members of the set 3^N-4.
RU(N+1) == 10*RU(N) + 1
However, repunit numbers are rarely prime. Looking out as far as 500 digit repunit numbers, we would see primes are pretty scarce in this specific family.
find(isprime(RU(1:500)))
ans =
2 19 23 317
There are of course good reasons why repunit numbers are rarely prime. One of them is they can only ever be prime when the number of digits is also prime. This is easy to show, as you can always factor any repunit number with a composite number of digits in a simple way:
1111 (4 digits) = 11*101
111111111 (9 digits) = 111*1001001
Finally, I'll mention that Mersenne primes are indeed another example of repunit primes, when expressed in base 2. A fun fact: a Mersenne number of the form 2^n-1, when n is prime, can only have prime factors of the form 1+2*k*n. Even the Mersenne number itself will be of the same general form. And remember that a Mersenne number M(n) can only ever be prime when n is itself prime. Try it! For example, 11 is prime.
Mn = @(n) sym(2).^n - 1;
Mn(11)
ans =
2047
Note that 2047 = 1 + 186*11. But M(11) is not itself prime.
factor(Mn(11))
ans =
[23, 89]
Looking carefully at both of those factors, we see that 23 == 1+2*11, and 89 = 1+8*11.
How does this help us? Perhaps you may see where this is going. The largest known Mersenne prime at this date is Mn(136279841). This is one seriously massive prime, containing 41,024,320 decimal digits. I have no plans to directly test numbers of that size for primality here, at least not with my current computing capacity. Regardless, even at that realm of immensity, we can still do something.
If the largest known Mersenne prime comes from n=136279841, then the next such prime must have a larger prime exponent. What are the next few primes that exceed 136279841?
np = NaN(1,11); np(1) = 136279841;
for i = 1:10
np(i+1) = nextprime(np(i)+1);
end
np(1) = [];
np
np =
Columns 1 through 8
136279879 136279901 136279919 136279933 136279967 136279981 136279987 136280003
Columns 9 through 10
136280009 136280051
The next 10 candidates for Mersenne primality lie in the set Mn(np), though it is unlikely that any of those Mersenne numbers will be prime. But ... is it possible that any of them may form the next Mersenne prime? At the very least, we can exclude a few of them.
for i = 1:10
2*find(powermod(sym(2),np(i),1+2*(1:50000)*np(i))==1)
end
ans =
18 40 64
ans =
1×0 empty double row vector
ans =
2
ans =
1×0 empty double row vector
ans =
1×0 empty double row vector
ans =
1×0 empty double row vector
ans =
1×0 empty double row vector
ans =
1×0 empty double row vector
ans =
1×0 empty double row vector
ans =
2
Even with this quick test which took only a few seconds to run on my computer, we see that 3 of those Mersenne numbers are clearly not prime. In fact, we already know three of the factors of M(136279879), as 1+[18,40,64]*136279879.
You might ask, when is the MOD style test, using a large scale test for roughness against many thousands or millions of small primes, when is it better than the use of GCD? The answer here is clear. Use the large scale mod test when you can easily move from one member of the family to the next, typically using a linear recurrence. Simple such examples of this are:
1. Repunit numbers
General form: R(n) = (10^n-1)/9
Recurrence: R(n+1) = 10*R(n) + 1, R(0) = 1, R(1) = 11
2. Fibonacci numbers.
Recurrence: F(n+1) = F(n) + F(n-1), F(0) = 0, F(1) = 1
3. Mersenne numbers.
General form: M(n) = 2^n - 1
Recurrence: M(n+1) = 2*M(n) + 1
4. Cullen numbers, https://en.wikipedia.org/wiki/Cullen_number
General form: C(n) = n*2^n + 1
Recurrence: C(n+1) = 4*C(n) + 4*C(n-1) + 1
5. Hampshire numbers: (My own choice of name)
General form: H(n,b) = (n+1)*b^n - 1
Recurrence: H(n+1,b) = 2*b*H(n-1,b) - b^2*H(n-2,b) - (b-1)^2, H(0,b) = 0, H(1,b) = 2*b-1
6. Tin numbers, so named because Sn is the atomic symbol for tin.
General form: S(n) = 2*n*F(n) + 1, where F(n) is the nth Fibonacci number.
Recurrence: S(n) = S(n-5) + S(n-4) - 3*S(n-3) - S(n-2) +3*S(n-1);
To wrap thing up, I hope you have enjoyed this beginning of a journey into large primes and non-primes. I've shown a few ways we can use roughness, first in a constructive way to identify numbers which may harbor primes in a greater density than would otherwise be expected. Next, using GCD in a very pretty way, and finally by use of MOD and the full power of MATLAB to test elements of a sequence of numbers for potential primality.
My next post will delve into the world of Fermat and his little theorem, showing how it can be used as a stronger test for primality (though not perfect.)
Yes, some readers might now argue that I used roughness in a crazy way in my last post, in my approach to finding a large twin prime pair. That is, I deliberately constructed a family of integers that were known to be a-priori rough. But, suppose I gave you some large, rather arbitrarily constructed number, and asked you to tell me if it is prime? For example, to pull a number out of my hat, consider
P = sym(2)^122397 + 65;
floor(vpa(log10(P) + 1))
36846 decimal digits is pretty large. And in fact, large enough that sym/isprime in R2024b will literally choke on it. But is it prime? Can we efficiently learn if it is at least not prime?
A nice way to learn the roughness of even a very large number like this is to use GCD.
gcd(P,prod(sym(primes(10000))))
If the greatest common divisor between P and prod(sym(primes(10000))) is 1, then P is NOT divisible by any small prime from that set, since they have no common divisors. And so we can learn that P is indeed fairly rough, 10000-rough in fact. That means P is more likely to be prime than most other large integers in that domain.
gcd(P,prod(sym(primes(100000))))
However, this rather efficiently tells us that in fact, P is not prime, as it has a common factor with some integer greater than 1, and less then 1e5.
I suppose you might think this is nothing different from doing trial divides, or using the mod function. But GCD is a much faster way to solve the problem. As a test, I timed the two.
timeit(@() gcd(P,prod(sym(primes(100000)))))
timeit(@() any(mod(P,primes(100000)) == 0))
Even worse, in the first test, much if not most of that time is spent in merely computing the product of those primes.
pprod = prod(sym(primes(100000)));
timeit(@() gcd(P,pprod))
So even though pprod is itself a huge number, with over 43000 decimal digits, we can use it quite efficiently, especially if you precompute that product if you will do this often.
How might I use roughness, if my goal was to find the next larger prime beyond 2^122397? I'll look fairly deeply, looking only for 1e7-rough numbers, because these numbers are pretty seriously large. Any direct test for primality will take some serious time to perform.
pprod = prod(sym(primes(10000000)));
find(1 == gcd(sym(2)^122397 + (1:2:199),pprod))*2 - 1
2^122397 plus any one of those numbers is known to be 1e7-rough, and therefore very possibly prime. A direct test at this point would surely take hours and I don't want to wait that long. So I'll back off just a little to identify the next prime that follows 2^10000. Even that will take some CPU time.
What is the next prime that follows 2^10000? In this case, the number has a little over 3000 decimal digits. But, even with pprod set at the product of primes less than 1e7, only a few seconds were needed to identify many numbers that are 1e7-rough.
P10000 = sym(2)^10000;
k = find(1 == gcd(P10000 + (1:2:1999),pprod))*2 - 1
k =
Columns 1 through 8
15 51 63 85 165 171 177 183
Columns 9 through 16
253 267 273 295 315 421 427 451
Columns 17 through 24
511 531 567 601 603 675 687 717
Columns 25 through 32
723 735 763 771 783 793 795 823
Columns 33 through 40
837 853 865 885 925 955 997 1005
Columns 41 through 48
1017 1023 1045 1051 1071 1075 1095 1107
Columns 49 through 56
1261 1285 1287 1305 1371 1387 1417 1497
Columns 57 through 64
1507 1581 1591 1593 1681 1683 1705 1771
Columns 65 through 69
1773 1831 1837 1911 1917
Among the 1000 odd numbers immediately following 2^10000, there are exactly 69 that are 1e7-rough. Every other odd number in that sequence is now known to be composite, and even though we don't know the full factorization of those 931 composite numbers, we don't care in the context as they are not prime. I would next apply a stronger test for primality to only those few candidates which are known to be rough. Eventually after an extensive search, we would learn the next prime succeeding 2^10000 is 2^10000+13425.
In my next post, I show how to use MOD, and all the cores in your CPU to test for roughness.
How can we use roughness in an effective context to identify large primes? I can quickly think of quite a few examples where we might do so. Again, remember I will be looking for primes with not just hundreds of decimal digits, or even only a few thousand digits. The eventual target is higher than that. Forget about targets for now though, as this is a journey, and what matters in this journey is what we may learn along the way.
I think the most obvious way to employ roughness is in a search for twin primes. Though not yet proven, the twin prime conjecture:
If it is true, it tells us there are infinitely many twin prime pairs. A twin prime pair is two integers with a separation of 2, such that both of them are prime. We can find quite a few of them at first, as we have {3,5}, {5,7}, {11,13}, etc. But there is only ONE pair of integers with a spacing of 1, such that both of them are prime. That is the pair {2,3}. And since primes are less and less common as we go further out, possibly there are only a finite number of twins with a spacing of exactly 2? Anyway, while I'm fairly sure the twin prime conjecture will one day be shown to be true, it can still be interesting to search for larger and larger twin prime pairs. The largest such known pair at the moment is
2996863034895*2^1290000 +/- 1
This is a pair with 388342 decimal digits. And while seriously large, it is still in range of large integers we can work with in MATLAB, though certainly not in double precision. In my own personal work on my own computer, I've done prime testing on integers (in MATLAB) with considerably more than 100,000 decimal digits.
But, again you may ask, just how does roughness help us here? In fact, this application of roughness is not new with me. You might want to read about tools like NewPGen {https://t5k.org/programs/NewPGen/} which sieves out numbers known to be composite, before any direct tests for primality are performed.
Before we even try to talk about numbers with thousands or hundreds of thousands of decimal digits, look at 6=2*3. You might observe
isprime([-1,1] + 6)
shows that both 5 and 7 are prime. This should not be a surprise, but think about what happens, about why it generated a twin prime pair. 6 is divisible by both 2 and 3, so neither 5 or 7 can possibly be divisible by either small prime as they are one more or one less than a multiple of both 2 and 3. We can try this again, pushing the limits just a bit.
isprime([-1,1] + 2*3*5)
That is again interesting. 30=2*3*5 is evenly divisible by 2, 3, and 5. The result is both 29 and 31 are prime, because adding 1 or subtracting 1 from a multiple of 2, 3, or 5 will always result in a number that is not divisible by any of those small primes. The next larger prime after 5 is 7, but it cannot be a factor of 29 or 31, since it is greater than both sqrt(29) and sqrt(31).
We have quite efficiently found another twin prime pair. Can we take this a step further? 210=2*3*5*7 is the smallest such highly composite number that is divisible by all primes up to 7. Can we use the same trick once more?
isprime([-1,1] + 2*3*5*7)
And here the trick fails, because 209=11*19 is not in fact prime. However, can we use the large twin prime trick we saw before? Consider numbers of the form [-1,1]+a*210, where a is itself some small integer?
a = 2;
isprime([-1,1] + a*2*3*5*7)
I did not need to look far, only out to a=2, because both 419 and 421 are prime. You might argue we have formed a twin prime "factory", of sorts. Next, I'll go out as far as the product of all primes not exceeding 60. This is a number with 22 decimal digits, already too large to represent as a double, or even as uint64.
prod(sym(primes(60)))
a = find(all(isprime([-1;1] + prod(sym(primes(60)))*(1:100)),1))
That easily identifies 3 such twin prime pairs, each of which has roughly 23 decimal digits, each of which have the form a*1922760350154212639070+/-1. The twin prime factory is still working well. Going further out to integers with 37 decimal digits, we can easily find two more such pairs that employ the product of all primes not exceeding 100.
prod(sym(primes(100)))
a = find(all(isprime([-1;1] + prod(sym(primes(100)))*(1:100)),1))
This is in fact an efficient way of identifying large twin prime pairs, because it chooses a massively composite number as the product of many distinct small primes. Adding or subtracting 1 from such a number will result always in a rough number, not divisible by any of the primes employed. With a little more CPU time expended, now working with numbers with over 1000 decimal digits, I will claim this next pair forms a twin prime pair, and is the smallest such pair we can generate in this way from the product of the primes not exceeding 2500.
isprime(7826*prod(sym(primes(2500))) + [-1 1])
ans =
logical
1
Unfortunately, 1000 decimal digits is at or near the limit of what the sym/isprime tool can do for us. It does beg the question, asking if there are alternatives to the sym/isprime tool, as an isProbablePrime test, usually based on Miller-Rabin is often employed. But this is gist for yet another set of posts.
Anyway, I've done a search for primes of the form
a*prod(sym(primes(10000))) +/- 1
having gone out as far as a = 600000, with no success as of yet. (My estimate is I will find a pair by the time I get near 5e6 for a.) Anyway, if others can find a better way to search for large twin primes in MATLAB, or if you know of a larger twin prime pair of this extended form, feel free to chime in.
My next post shows how to use GCD in a very nice way to identify roughness, on a large scale.
What is a rough number? What can they be used for? Today I'll take you down a journey into the land of prime numbers (in MATLAB). But remember that a journey is not always about your destination, but about what you learn along the way. And so, while this will be all about primes, and specifically large primes, before we get there we need some background. That will start with rough numbers.
Rough numbers are what I would describe as wannabe primes. Almost primes, and even sometimes prime, but often not prime. They could've been prime, but may not quite make it to the top. (If you are thinking of Marlon Brando here, telling us he "could've been a contender", you are on the right track.)
Mathematically, we could call a number k-rough if it is evenly divisible by no prime smaller than k. (Some authors will use the term k-rough to denote a number where the smallest prime factor is GREATER than k. The difference here is a minor one, and inconsequential for my purposes.) And there are also smooth numbers, numerical antagonists to the rough ones, those numbers with only small prime factors. They are not relevant to the topic today, even though smooth numbers are terribly valuable tools in mathematics. Please forward my apologies to the smooth numbers.
Have you seen rough numbers in use before? Probably so, at least if you ever learned about the sieve of Eratosthenes for prime numbers, though probably the concept of roughness was never explicitly discussed at the time. The sieve is simple. Suppose you wanted a list of all primes less than 100? (Without using the primes function itself.)
% simple sieve of Eratosthenes
Nmax = 100;
N = true(1,Nmax); % A boolean vector which when done, will indicate primes
N(1) = false; % 1 is not a prime by definition
nextP = find(N,1,'first'); % the first prime is 2
while nextP <= sqrt(Nmax)
% flag multiples of nextP as not prime
N(nextP*nextP:nextP:end) = false;
% find the first element after nextP that remains true
nextP = nextP + find(N(nextP+1:end),1,'first');
end
primeList = find(N)
Indeed, that is the set of all 25 primes not exceeding 100. If you think about how the sieve worked, it first found 2 is prime. Then it discarded all integer multiples of 2. The first element after 2 that remains as true is 3. 3 is of course the second prime. At each pass through the loop, the true elements that remain correspond to numbers which are becoming more and more rough. By the time we have eliminated all multiples of 2, 3, 5, and finally 7, everything else that remains below 100 must be prime! The next prime on the list we would find is 11, but we have already removed all multiples of 11 that do not exceed 100, since 11^2=121. For example, 77 is 11*7, but we already removed it, because 77 is a multiple of 7.
Such a simple sieve to find primes is great for small primes. However is not remotely useful in terms of finding primes with many thousands or even millions of decimal digits. And that is where I want to go, eventually. So how might we use roughness in a useful way? You can think of roughness as a way to increase the relative density of primes. That is, all primes are rough numbers. In fact, they are maximally rough. But not all rough numbers are primes. We might think of roughness as a necessary, but not sufficient condition to be prime.
How many primes lie in the interval [1e6,2e6]?
numel(primes(2e6)) - numel(primes(1e6))
There are 70435 primes greater than 1e6, but less than 2e6. Given there are 1 million natural numbers in that set, roughly 7% of those numbers were prime. Next, how many 100-rough numbers lie in that same interval?
N = (1e6:2e6)';
roughInd = all(mod(N,primes(100)) > 0,2);
sum(roughInd)
That is, there are 120571 100-rough numbers in that interval, but all those 70435 primes form a subset of the 100-rough numbers. What does this tell us? Of the 1 million numbers in that interval, approximately 12% of them were 100-rough, but 58% of the rough set were prime.
The point being, if we can efficiently identify a number as being rough, then we can substantially increase the chance it is also prime. Roughness in this sense is a prime densifier. (Is that even a word? It is now.) If we can reduce the number of times we need to perform an explicit isprime test, that will gain greatly because a direct test for primality is often quite costly in CPU time, at least on really large numbers.
In my next post, I'll show some ways we can employ rough numbers to look for some large primes.


