It wouldn't produce an exception, it would not compile. The nice thing is that you can avoid range checking at runtime.
Exactly, Ada's modular types would be a good option in this case, if that is what you want (my feeling is, most likely not unless you are doing some low level stuff). An alternative would be to rewrite the for loop in a functional or range based style.
In algorithmic code, you almost never want overflow. If you have a little function to calculate something, you want the intermediate variables to be big enough to perform the calculation, and in the end you cast it down to the size needed (maybe the compiler can do it, but maybe you know from some mathematical principles that the number is in a certain range and do it manually). In any case, I would want to be warned by the compiler if I am:
1. loosing precision
2. performing a wrong calculation (overflowing)
3. accidentially loosing performance (using bignums when avoidable)
1 and 2 can happen in C if you are not careful. 3 could theoretically happen in Python I guess, but it handles the transition int <-> bignum transparently good enough so it was never an issue for me.
Of course with well defined overflow you can still end up with a nonsense result that just happens to be in your valid range of values. If you don't plan to harden your code against integer overflow manually you need a language that actively checks it for you or uses a variable sized integer to be safe.
If Python handled overflow with an exception instead of a bignum, I'd bet it still wouldn't cause trouble either. The actual values generally aren't reaching bignums.
That's a runtime check which does part of what Ada will do. Ada will also perform compile time checks related to uses of these restricted range variables. SPARK can go even further and detect potential overflow/underflow errors (and other things) forcing you to add preconditions (one resolution) to functions that declare that the values being added (or otherwise operated on) are small enough that overflow won't happen.
Ada will also use a reasonable storage size. If Age is ranged [0,150] then it can place it in a single byte (you can also influence the storage size so you can increase this if you want for some reason), there is less memory overhead than an object. Since it's a range it brings in (automatically) all the arithmetic operations you'd expect.
EDIT: Regarding storage. That's technically implementation defined. GNAT, at least, will use a reasonable storage size by default, you have to override it and ask for something bigger. I had occasion to use Green Hills years ago and it did the same as I recall. I'd expect any other commercial implementation to use an appropriate size and not something absurdly large like 64 bits for a range that easily fits in 8 bits, it would not fit their general market (safety-critical, performance-critical, real-time, and embedded systems). Poor use of memory could cripple the utility of a compiler in these kinds of systems (especially performance-critical, real-time, and embedded).
Python does just that for integers (since it automatically casts up to bigints instead of overflowing), as does Swift, and that's arguably fairly low-level/"compiled"! It doesn't strike me as an absurd idea.
Compilers could optimize for common cases where overflows are guaranteed to not happen, or perform the "isn't b100..000" check at the entrance to an inner loop and defer to the hardware's native overflow detection capabilities (if any).
Too many years ago, I wrote something in-house, now lost, titled "Type integer considered harmful". I was arguing for the idea that everything should have an explicit range, written
foo: 0..100;
This was around the time Ada was being designed. The way I wanted this to work is that the size of intermediate variables was determined by the compiler, with the following rules:
- Unless a named typed variable result will overflow, no unnamed intermediate variable can overflow. The compiler must size intermediate temporary values accordingly.
- If this requires a larger intermediate variable than the machine can provide, a compile error is reported.
The implication is that you often need longer integers than you'd expect. Expressions such as
x = a + b - c
x = (a + b) / c
with signed values, all, say, 32-bit integers, can overflow in a+b without overflowing x.
So such expressions have to be computed in 64 bits, then checked for overflow. This eliminates hidden overflow in intermediates. An expression with only one operator never overflows in a recoverable way, so it just has to be checked, not expanded.
That was written in an era when there were computers in use with word sizes of 8, 12, 16, 18, 24, 32, 36, 48, 60, and 64 bits. So word size was more of a practical issue than it is now, when non power of 2 machines have died off. Also, machines of that era tended to be slower on longer values. Much slower on some machines which had to do double-length arithmetic in software. Thus, there was a performance hit for this approach, which was the main objection.
Excellent! I've wanted compilers to optimize overflow checks and subscript checks since my proof of correctness work decades ago. This was done in the 1980s for Pascal, but the technology was forgotten. It's finally going mainstream.
The next optimization, which the parent article doesn't cover, is hoisting checks out of loops. This is a huge win for subscript checks. Most newer compilers optimize FOR loops, at least.
Having detected an error, what do you do? An exception would be nice, but few languages other than Ada handle machine level exceptions well. Aborting the whole program is drastic.
You could actually have runtime checks by default (in a hypothetical language), and have a smart compiler elide them whenever possible.
int32 a, b = ...;
a+b; // there is a check
// implicitly:
// if (MAX_INT32 - a <= b) throw OverflowException;
// a+b;
but
int32 a = ...;
// after here: typeof(a) = int32
int32 b = rand_int(0, 1000);
// after here: typeof(b) = int32[x|where x >=0 and x < 1000]
if (a < 500) {
// in here: typeof(a) = int32[x|where x < 500]
a+b; // the overflow check can be elided
}
So the compiler would be able to narrow down the type of a variable, and know what operations are safe to perform. This is probably impossible in the general case (halting problem and so on), but I believe it is very doable if you restrict yourselves to a limited number of subranges. This is like a kind of dependent types, but completely internal (you could expose them, if you wanted though).
The compiler can't only use this extra information to remove overflow checks, but you can also have a language that guarantees there is no overflow - add two int32 and the result is an int64, and so on. And if it can infer that the result fits in an int16, then it can put it in there. But most of the time you would just use int, which means: integer variable of enough length to store my data. int8, int16, int32, maybe a BigNum. Kind of like python does it, but with the ability to pick optimized native types if needed.
Somewhat related: there's apparently a defect in the Ada language spec which permits reordering of arithmetic operations such that out-of-range behaviour may hinge on the compiler's whim: [0]
> non-parenthesized arithmetic operations could be re-ordered by the compiler, which may result in a failing computation (due to overflow checking) becoming a successful one, and vice-versa. By default, GNATprove evaluates all expressions left-to-right, like GNAT.
Presumably one solution would be to use a three-address-code style [2] to completely tie the compiler's hands regarding ordering, but this seems painfully restrictive even by the standards of SPARK.
Also, an Ada compiler's internal choice of base type (with which to implement a range-based integer type) can impact overflow behaviour: [1]
> The choice of base types influences in which cases intermediate overflows may be raised during computation. The choice made in GNATprove is the strictest one among existing compilers, as far as we know, which ensures that GNATprove’s analysis detects a superset of the overflows that may occur at run time.
I don't know if there's a fully portable robust answer to this second problem. (I believe you can generally use hints to force the Ada compiler to use a particular sized type, along the lines of C's uint32_t, but that this isn't portable.)
edit On second thought, I imagine using a three-address-code style would solve this too. Either the result falls within the permitted range of the destination variable, or it doesn't.
I don't see how this is any different from overflowing an int32 in any other programming language. Sure, with an int, x != x + 1 even after an overflow, but your program is still going to crash when it tries to bill you for negative two billion widgets.
If you're dancing on the edge of the limits of numerical representation then you need to write code to protect against bad things. If you don't write said code, your program is going to fail to work correctly, no matter what language you use.
That is generally not possible in a strongly-typed language unless you just use a bignum type to begin with.
Most numbers are small, and you aren't worried about overflow.
With big numbers, std::int64_t is hard to overflow. If you have numbers from an untrusted source, you can and should range-check them immediately on input.
I would argue that the bug is not in the algorithm -- the bug is in languages that don't detect integer overflow by default. Arbitrary-precision integers, as are the default in Lisp and Python, are ideal, but at the very least the implementation ought to throw an exception. Yes, occasionally one specifically wants arithmetic modulo 2^32 or some other word size, and languages should provide that as an option, but it shouldn't be the default.
It's a small thing, but it's nice to do. Or at least detect overflows and move to bignum in the default numerical implementation. It's not end-of-the-world if you don't, but it helps avoid a lot of bugs...
Completely agreed. The statement "that function is wrong because it could overflow" is putting one potential edge case on a pedestal while ignoring all other considerations for why you might prefer one alternative over another. The vast majority of the code in most business applications can perform straight arithmetic on 64-bit ints without worrying too much about overflow -- you just never encounter numbers like 2^63, and only rarely encounter numbers that wouldn't fit in a 32-bit signed integer.
When you write a bit of code, you naturally have in mind the realistic range of values you'll be working with. Even if it's just within 4 orders of magnitude. You know whether you're dealing with thousands or quadrillions. In the extremely rare case it's the latter, then you start worrying about this. You just can't worry about being 10 orders of magnitude off in your assumptions all the time -- that's what Exceptions are for. Holy crap, this software is dealing with values ten orders of magnitude different than you programmed for!? Absolutely throw an exception in that case, because it requires a complete rewrite of that area of the code.
Yes, if you're writing midpoint in a language-standard math library, it should work for all possible inputs. But the point of looking at toy problems in software engineering blogs is to inform us how to write our much more complicated, much more domain-specific code, and these lessons just don't cleanly transfer into that world.
WUFFS call that type refinement, although you are free to refine much larger types e.g. you can say base.u32[0 ..= 15] and this is a 32-bit unsigned integer... that can't be more than 15.
WUFFS not only refuses overflow (you get a compiler diagnostic) it also infers these refinements from array sizing so as to (at compile time) ensure bounds errors never occur. If you try to index into an array of 426 integers with k, WUFFS will go ahead and refine k to the 0..425 range.
Of course Ada (and presumably your proposal) are for general purpose software, whereas WUFFS is specifically targeted at software for well, Wrangling Untrusted File Formats Safely. So it's fine for some things to be completely impossible in WUFFS (e.g. you cannot write "Hello, World" in WUFFS, because WUFFS intentionally has no idea what a "string" is, or what "console output" is).
This is possible, but unless processor accelerated comes at a great cost. You'd need a branch after every math, unless the compiler could prove the math wouldn't overflow.
The ARM mode described elsewhere in the thread where there's an overflow flag that persists across operations would help; then you could do a check less frequently.
A mode were you get a processor exception would be great, if adding that doesn't add significant cost to operations that don't overflow. Assuming the response to such an exception is expected to be a core dump, the cost of generating such an exception can be high; of course if someone builds their bignumber library around the exception, that won't be great.
Exactly, Ada's modular types would be a good option in this case, if that is what you want (my feeling is, most likely not unless you are doing some low level stuff). An alternative would be to rewrite the for loop in a functional or range based style.
In algorithmic code, you almost never want overflow. If you have a little function to calculate something, you want the intermediate variables to be big enough to perform the calculation, and in the end you cast it down to the size needed (maybe the compiler can do it, but maybe you know from some mathematical principles that the number is in a certain range and do it manually). In any case, I would want to be warned by the compiler if I am:
1. loosing precision
2. performing a wrong calculation (overflowing)
3. accidentially loosing performance (using bignums when avoidable)
1 and 2 can happen in C if you are not careful. 3 could theoretically happen in Python I guess, but it handles the transition int <-> bignum transparently good enough so it was never an issue for me.
reply