float_params2

Version 1.0.1 (2.66 KB) by Marco Cococcioni

MATLAB Code for Parameters of Floating-Point Arithmetics

0.0

(0)

20 Downloads

Updated 10 Jun 2021

View License

`float_params2` is a MATLAB function for obtaining the parameters of several

floating-point arithmetics. The parameters are built into the code and are

not computed at run time.

The parameters are

- the unit roundoff,

- the smallest positive (subnormal) floating-point number,

- the smallest positive normalized floating-point number,

- the largest floating-point number,

- the number of binary digits in the significand (including the

implicit leading bit)

and the arithmetics supported are

- bfloat8,

- bfloat16,

- IEEE half precision (fp16),

- IEEE single precision (fp32),

- IEEE double precision (fp64),

- IEEE quadruple precision (fp128).

The code was developed in MATLAB R2020a and works with versions at least

back to R2016b.

This is a small extension to float_params of Nick Higham, to which I added the

support to the 8-bit Brain Float, as proposed at Intel by Naveen K. Mellempudi.

More details can be found here: https://arxiv.org/abs/1905.12334

I also renamed NVIDIA tf32 into tf19, just to reflect that it is a 19-bit precision float.

Cite As

Marco Cococcioni (2025). float_params2 (https://www.mathworks.com/matlabcentral/fileexchange/93835-float_params2), MATLAB Central File Exchange. Retrieved March 2, 2025.

MATLAB Release Compatibility

Created with R2021a

Compatible with any release

Platform Compatibility

Windows macOS Linux

Tags Add Tags

Acknowledgements

Inspired by: float_params

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

float_params2.m

Version	Published	Release Notes
1.0.1	10 Jun 2021	very small update	Download
1.0.0	10 Jun 2021		Download