## Safe Computing: Integer Arithmetic

A few quick notes to set the scene.  Firstly, the need for safe integer arithmetic was pointed out in the CWE list and there’s an excellent example here.   I’m going to use C-like syntax for my examples because it’s the language I’m most familiar with, although the same principles apply to other languages.  Finally, I’ll be using 16-bit arithmetic most of the time because it makes it easier to demonstrate overflow without using inconveniently large numbers.  Where appropriate it will be necessary to imagine that the code is running on a 16-bit system so that the 16-bit values are not automatically widened to 32 bits or more.

I’m going to use new keywords for integer variable types: int<n> to indicate a signed integer n bits wide, e.g., int32 for a 32-bit signed integer; and nat<n> to indicate an unsigned integer, e.g., nat32 for a 32-bit unsigned integer.  In mathematics, natural numbers are the positive integers, sometimes (as in this case) including zero, and I find “unsigned int” unnecessarily long-winded and abbreviations like “uint” ugly.

Making integer arithmetic safe is simple in principle and can be summed up briefly: all integer arithmetic must either produce the right answer or cause execution to stop.  I’ll leave vague exactly what “stop” means because it depends on context, but the same sort of mechanisms already used to deal with a divide-by-zero are probably suitable.

In the following example, execution will stop on the second line:

```nat16 a = 65535;
nat16 b = a + 1;```

Execution will also stop on the second line in this example:

```nat16 a = 32767;
int16 b = a + 1;```

More subtly, this also needs to cause execution to stop:

```nat16 a = 32768;
int16 b = a;```

Technically an assignment isn’t arithmetic, but it still needs to be safe. 🙂

We need to detect underflow as well as overflow:

```nat16 a = 5;
nat16 b = a - 10;```

This is a bit trickier:

```nat16 a = 32768;
nat16 b = 32768;
nat16 c = (a + b) / 2;```

The correct answer is 32768, which will fit into the nat16 type, but the intermediate value of 65536 won’t.  Whether you will get the right answer depends on what type the compiler uses when working out the expression.  In some cases (such as C in a 32-bit environment) the language specification will guarantee that the code succeeds; in other cases, I would suggest that the compiler be permitted to either succeed or to stop at its own convenience.  The only thing it shouldn’t be allowed to do is decide that the answer is zero.  (Unfortunately, that’s exactly what will happen with most real-world compilers!)

On occasion the programmer needs to allow an overflow to occur, so the language  should define a way to do that.  There are a number of options.  The simplest and most elegant would be to introduce a keyword:

```nat16 a = 65535;
nat16 b = lenient a + 1; // b will be set to zero```

One problem: this code then wouldn’t compile if the compiler didn’t know about the keyword.  Perhaps a pragma, then:

```nat16 a = 65535;
#pragma lenient
nat16 b = a + 1; // b will be set to zero
#pragma strict```

Yet another alternative would be to require the compiler to ensure that an overflow in an intermediate calculation won’t stop execution if the final result is still correct:

```nat16 a = 65535;
nat16 b = (a + 1) & 65535;```

However, I’m not sure this approach is feasible in general, i.e., for more complicated expressions.

On a side note, it might also be desirable to simplify the rules determining the types used in expressions, including a straightforward mechanism for the programmer to override the defaults.  I don’t think I should discuss that now, but I’ll try to write about it later in a separate post.

Now it is of course possible for a compiler to do all the above using the standard x86 instruction set – it just needs to check the carry or overflow flag as appropriate after each step.  Why doesn’t this happen?  Nowadays, backwards compatibility would be a problem, but presumably the original issue was performance, and I suspect this is still a concern, particularly where branch instructions would be necessary.  (The INTO instruction is presumably quite efficient, but there is no corresponding INTC instruction for unsigned arithmetic.)

It shouldn’t be too hard for the CPU to come to our aid here.  For unsigned arithmetic, I think the best approach would be introduce a separate overflow bit for each register, rather than (or as well as) having carry and overflow flags.  Writing the register to memory would trigger a software interrupt (or equivalent) if the overflow bit is set, as would using the register in any other inappropriate way.

Also, instructions for writing only part of a register to memory (the low byte, for example) should generate a software interrupt if the remainder of the register is not zero.

For example, on a 16-bit processor and using a hopefully self-explanatory syntax:

```move 65535 -> a0     # register a0 is set to 65535 and the a0 overflow bit is cleared.
move a0 -> nat8 x    # write a0 to 8-bit memory location x;
# this causes a software exception because a0 doesn't fit in 8 bits.
add 1 -> a0          # a0 is now zero and the a0 overflow bit is set.
move a0 -> nat16 y   # this causes a software exception, because the overflow bit is set.
add a0 -> a1         # add the value of a0 to a1;
# this causes a software exception because the a0 overflow bit is set.

move 65535 -> a0    # a0 is set to 65535 and the a0 overflow bit is cleared.
move 65535 -> a1    # a1 is set to 65535 and the a1 overflow bit is cleared.
add 1 -> a0         # a0 is set to zero and the a0 overflow bit is set.
addc a0, 0 -> a1    # the a0 overflow bit is added to a1;
# the a0 overflow bit is now clear;
# a1 is now zero and the a1 overflow bit is set.
move a0 -> nat16 x  # zero is written to the sixteen-bit memory location.
move a1 -> nat16 y  # this causes a software exception.
clc a1              # but, if we were to clear the a1 overflow bit first ...
move a1 -> nat16 y  # then this would succeed.```

Hopefully, this is clear.  There are details that need filling in, but I think nothing difficult.

Signed arithmetic is much the same, except for one edge case.  There are at least three different ways of distinguishing between signed and unsigned arithmetic; the simplest is distinct arithmetic instructions:

```move 255 -> a0       # register a0 is set to 255 and the a0 overflow bit is cleared.
move a0 -> nat8 x    # this will write 255 to the 8-bit memory location x.
move a0 -> int8 x    # this causes a software exception because 255 doesn't fit in
# an 8-bit signed variable.  Note that this is a different
# instruction code to the above instruction.
adds 32767 -> a0     # a0 is now 33022 (meaning -32514) and the a0 overflow bit is set.
move a0 -> int16 y   # this causes a software exception because the a0 overflow bit is set.
move a0 -> nat16 y   # so does this.

move 65535 -> a0   # a0 is set to 65535 and the a0 overflow bit is cleared.
move 32767 -> a1   # a1 is set to 32767 and the a1 overflow bit is cleared.
add 1 -> a0        # a0 is set to zero and the a0 overflow bit is set.
addcs a0, 0 -> a1  # the a0 overflow bit is added to a1 using signed arithmetic;
# the a0 overflow bit is now clear;
# a1 is now 32768 (meaning -32768) and the a1 overflow bit is set.
move a0 -> nat16 x  # zero is written to the sixteen-bit memory location.
move a1 -> int16 y  # this causes a software exception.
clc a1              # but, if we were to clear the a1 overflow bit first ...
move a1 -> int16 y  # then this would succeed.```

I mentioned an edge case.  This is adding a signed number to an unsigned number.  Consider this code:

```nat32 total;
void add_to_total(int32 x)
{
total = total + x;
}```

Simple enough, but … do you do the addition with signed or unsigned arithmetic?  You can’t use unsigned arithmetic because x may be negative.  But you can’t always convert total to int32, either; it may be too large.

One solution would be to convert to int64, but this is inefficient if the processor is 32 bits.  Another is to rewrite the code:

```void add_to_total(int32 x)
{
if (x > 0) total = total + (nat32)x;
if (x < 0) total = total - (nat32)(-x);
}```

But now we’re introducing branches, which is also inefficient.

My suggestion is to introduce a new type, int33 (as well as int9, int17, and int65) which is that extra bit longer.  These types would only be valid as intermediate values in expressions, not as variables.  The code then becomes:

```void add_to_total(int32 x)
{
total = (int33)total + x;
}```

The compiler is then free to deal with this in the best way available to it in the given instruction set.  Note that by requiring the programmer to explicitly request this special treatment, we avoid inefficiency in cases where the programmer expects total to always be small enough to convert to int32.  In these cases it might be wise for the compiler to generate a warning unless the programmer explicitly says this is OK:

```void add_to_total(int32 x)
{
total = (int32)total + x;
}```

Going back to the CPU, there are several ways a new design could help the compiler out in implementing int33.  The simplest would be to give the arithmetic registers a dedicated sign bit, so that register values are always int33 (or int65, or int129).  This would require separate instructions for loading signed and unsigned values from memory (so the CPU knows whether to sign-extend) but since this is already required when loading values smaller than the register size this probably isn’t a big issue.

I hope this is all clear enough.  Please email me or leave a comment if you have any questions.

Harry.

Advertisements

Tags: