[PLUG-TALK] Integer Overflow and Other Computer Number Storing Issues

Keith Lofstrom keithl at gate.kl-ic.com
Thu May 7 02:01:41 UTC 2015


On Wed, May 06, 2015 at 06:10:32AM -0700, Rich Shepard wrote:
>   Here's a good explanation of number storage in computers and what can
> happen when the numbers grow too large. Good for the non-technical
> decision-maker.
> 
> <http://www.bbc.com/future/story/20150505-the-numbers-that-lead-to-disaster>

A BIGINT (Big Integer) rant:

While integers will continue to be processed in 32 or 64 bit 
chunks, it is not that difficult to design a processor that
automagically concatenates the chunks into BIGINTs in hardware,
and handles their allocation in memory, and does the bounds
checking in hardware as well. 

In a pipelined machine, bounds errors might be detected a
few dozen clock cycles later.  Backtracing the cause of an
error would be vexing, but better that than depending on
all programmers to always implement these tests correctly. 

Hardware designers make mistakes also (ie the FDIV bug), but
the most careful programmers will find these behavior bugs,
and all other programmers will benefit from the correction.
Outside the open source community, code isn't collectively
tested this way, and few of us get a chance to look at
avionics code.

   Note that division (and similar iterative operations) is
   also a minefield when implemented for software BIGINT;
   that Intel FDIV hardware design mistake was created with
   software methods and "passed" inadequate software-based
   regression testing.  Presumably Intel has tools and tests
   in place to avoid those mistakes in the future.

Doing a task in transistors (of which Intel has a surplus) and
microcode is almost always faster and lower power than doing it
with fetched instructions.  For example, the latest processors
intended for battery-powered devices have dedicated hardware for
video rendering, which is turned on and exercised for screen
updates, then shut off a few microseconds later to save power.

The examples in the article involve time, and a "time" integer
will not exceed 256 bits, 0.1% of an Intel i5 L1 cache.  A
millenium in nanoseconds fits in 55 bits.  The age of the universe
expressed in Planck units (5.39E-44 secs) is 203 bits.  Floating
point numbers are convenient and compact but not essential.

4096 bit crypto might generate a few 1kiB 2x precision integers.
One advantage of doing crypto BIGINTs with hardware is that the
same computation can be performed simultaneously on smaller
modulo numbers (say 64 bits), useful for detecting computation
errors and independently verifying accuracy.

Doing BIGINTs in ultrafast hardware avoids the temptation
to write "speed optimized" and inadequately validated BIGINT
math libraries.  Sloppy programmers, like the ones who blew
up the Ariane rockets and may bring down some 787s, can keep
being lazy with less risk to life and infrastructure.  No
guarantee that programmers will use the BIGINT instructions,
of course, but their lack would be much easier to spot in a
regulatory code review.  Such errors should be rewarded with
8 bit monthly incomes for making fast food, not fast code.

I hope somebody from Intel reads this.  Adding a few million
more transistors for BIGINT might increase processor core size
a little, but a few days of Moore's law improvements will shrink
the cores back down again.  If this results in quicker debug and
shorter time-to-market for Intel's customers, that means quicker
ramp-up to full production for new Intel processes and fabs. 
Chip users who aren't exploding their product and scattering it
over Guiana beaches will have more money to spend on avionics.

Keith

P.S.  The 787 control unit failure after 248 hours the article
mentions is reminiscent of Nevil Shute's novel "No Highway", 
and the excellent movie "No Highway in the Sky" starring Jimmy
Stewart and Marlene Dietrich.  In one scene, Stewart performs
the best ever cinematic rendition of "nerd".  The hero is a
materials engineer whose calculations show that a new alloy 
will fail after a precise number of hours in the air, silly
metallurgy but plausible in software.  Budding scriptwriters
are encouraged to plagarize this story with software substituted
for hard metal, and be ready to pitch a made-for-TV movie in
Hollywood immediately after a software-caused 787 disaster. :-)



-- 
Keith Lofstrom          keithl at keithl.com





More information about the PLUG-talk mailing list