# Declaring variables with high precision

2 views (last 30 days)
Niles Martinsen on 25 Apr 2015
Commented: Walter Roberson on 7 Jun 2015
Hi
I have the following piece of code that I have also implemented in Fortran. According to Fortran the variable Ax(3,6,4) should yield a value of 1e-24, however, in the Matlab-code below the variable Ax(3,6,4) gets a value of 0. I believe it is because I lose precision "along the way" when declaring variables and performing calculations.
Question: Is there a way to define the relevant variables such that they retain their precision?
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%CONSTANTS
clc
clear all
sym_weight = [4/9, 1/9,1/9,1/9,1/9, 1/36,1/36,1/36,1/36];
dir_x = [ 0, 1, 0, -1, 0, 1, -1, -1, 1];
dir_y = [ 0, 0, 1, 0, -1, 1, 1, -1, -1];
ly = 11; lx = ly;
xC = 5; yC=xC;
density_high = 1.0;
density_low = 0.1;
interface_w = 1;
sigma_st = 0.0001;
beta = 12*sigma_st/(interface_w*(density_high-density_low)^4);
kappa = 1.5*sigma_st*interface_w/(density_high-density_low)^2;
saturated_density = 0.5*(density_high+density_low);
for x=1:lx
for y=1:ly
for i=1:9
fIn(i, x, y) = sym_weight(i)*density_high;
gIn(i, x, y) = 3*sym_weight(i);
fIn(i, x, y) = sym_weight(i)*( saturated_density - 0.5*(density_high-density_low)*tanh(2*(radius-sqrt((x-xC)^2 + (y-yC)^2))/interface_w) );
end
end
end
end
density_2d = ones(lx)*saturated_density;
for i=1:lx
density_aux(:,:,i) = abs(density_2d(:, i)');
end
density_local = sum(fIn);
L_density_local = (+1.0*(circshift(density_local(1,:,:), [0, +1, +1]) + circshift(density_local(1,:,:), [0, -1, +1]) + circshift(density_local(1,:,:), [0, +1, -1]) + circshift(density_local(1,:,:), [0, -1, -1])) + ...
+4.0*(circshift(density_local(1,:,:), [0, +1, +0]) + circshift(density_local(1,:,:), [0, -1, +0]) + circshift(density_local(1,:,:), [0, +0, +1]) + circshift(density_local(1,:,:), [0, +0, -1])) + ...
-20.0*density_local(1,:,:));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
chem_pot = 4*beta*(density_local-density_low).*(density_local-density_high).*(density_local-density_aux) - kappa*L_density_local/6;
i=3;
Ax(i,:,:) = (+circshift(chem_pot(1,:,:), [0,-2*dir_x(i),-2*dir_y(i)]) - chem_pot(1,:,:));
Ax(3,6,4)
James Tursa on 27 Apr 2015
Edited: James Tursa on 27 Apr 2015
@Niles: Can you post the exact results you are comparing? I get this in MATLAB:
>> format long
>> tanh1 = tanh(1)
tanh1 =
0.761594155955765
>> digits 100
>> tanh(vpa(1))
ans =
0.7615941559557648881194582826047935904127685972579365515968105001219532445766384834589475216736767144
>> num2strexact(tanh1)
ans =
0.761594155955764851029243800439871847629547119140625
>> num2strexact(tanh1+eps(tanh1))
ans =
0.76159415595576496205154626295552588999271392822265625
So the MATLAB answer is a bit on the "low" side compared to 100 digit precision result. But when you look at the next available number in IEEE double format, it is on the "high" side by more error than the "low" side number. So MATLAB gave you the closest number to tanh(1) that is in the IEEE double set.
What is this Fortran number result that is different in the 19th place? The 17th place I might believe, but the 19th place??? How did you determine that? How are you printing out the results for comparison? Do you have some special compiler flags set?
And, again, this begs the question that if differences in the 19th digit are bothering you and generating "incorrect" result, how accurate is the algorithm you are using and can you really trust your "correct" result?
Walter Roberson on 7 Jun 2015
The two languages might set the rounding flags differently. I think MATLAB is using round to even. When I glanced a couple of years ago one of the common Fortran implementations used round to zero. Round to nearest is an additional option.
You can use feature() to affect the rounding mode if you are working on Windows I seem to recall.

Jan on 26 Apr 2015
It is expected that e.g. tanh replies slightly different values when it is implemented in different programming languages, running on different processors, with different algorithms or different compilers. This is simply an effect of the limited precision of numerical calculations. In numerics these effects must be controlled and are a fundamental effect on non-symbolic calculations. A famous minimal example is the sum:
a = 1; b = 1e-17; c = -1;
S = sum([a, b, c])
What do you expect as result? What do you get? A compiler can resort the terms in "a + b + c" for an optimization. Therefore 0 and 1e-17 are correct results considering the limited precision.
You see that tanh replies slightly different solutions. Do you see how hard it will be to find out, which of the values is "better"?