Hacker Read

jandrewrogers · 2022-04-17 19:45:04

Unsigned types have an absolute advantage for unsigned values: they have twice the range. This turns out to have useful performance advantages e.g. in large data infrastructure software. In typical C++ metaprogramming for this type of software, the integer types for processing code paths are often generated using the most compact primitive type because it is expected that billions of instances of these types may be in memory and trillions in storage, and there are multiple integer types where the required range of values for each are interdependent. A considerable amount of design in database engines, for example, goes into maximizing scalability while minimizing the total footprint of integer type instances. This is a function of the expressible non-negative range of the types, and cutting it in half will cause some of the types to be promoted, doubling their footprint. It is difficult enough to squeeze them into unsigned types without a step change in integer size as it is.

That overflows are well-defined, in contrast to UB, has value in cases of bugs because the consequences of the bug will be well-defined and less subject to clever compiler optimizations.

reply

pron | karma 22164 | avg karma 3.06 · | 2023-03-04 05:39:51

Underflows are much more common than overflows (most numbers are small) and people again and again fall under the impression that unsigned integers are a good way to represent positive numbers, which is not at all what they do. E.g. C++ is very unhappy with the mistake they made of representing sizes with unsigned types [1].

Unsigned types are also primarily useful when the types have very few bits -- like in bitfields -- but Java doesn't have those, and the uses of unsigned types in Java would be mostly restricted to interaction with native code and hardware, that not many do and for which we have other solutions. So some very dangerous disadvantages and not many advantages for a language like Java.

[1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...

reply

quelsolaar | karma 2874 | avg karma 5.52 · | 2022-01-02 03:48:19

I agree with this almost entierly and i use almost exclsivly unsigned ints in C.

However, its not true that unsigned is faster because the compiler can optimize. Consider:

x = (x * 2) / 2;

If x is unsigned, and overflows, x will not be the same after this operation so the compiler can not optimize away this line. If x is signed and overflow is undefined, the compiler can assume it wont overflow and optimize away the line. UB affords the compiler a lot of optimizations.

reply

jcelerier | karma 10788 | avg karma 2.29 · | 2021-03-22 18:23:45+00:00

> (Some may disagree. For instance, the Google C++ style guide [1] specifically says not to "use unsigned types to say a number will never be negative", because they want the undefined overflow behavior of signed types, in order to allow the compiler to diagnose bugs and to avoid "imped[ing] optimization". I think this is mostly nonsense; the drawbacks far outweigh the benefits, and tools for detecting overflow like UBSan can be told to check unsigned overflow as well.)

Yes, I really disagree. unsigned integers mean one thing, which is "modular arithmetic". Unless you are in the very uncommon case of actually needing modular arithmetic, for instance, when implementing a crypto or hash algorithm, you want normal integers. As soon as you have anything that may have any chance of introducing a substraction somewhere, unsigned will cause bugs.

I don't know how many times I had to debug broken code such as

    for(int i = 0; i < some_size - 1; i++) { ... }

because some_size was unsigned.

If you really want a "number that cannot be negative", you don't wan't some_size - 1 to silently give you UINT_MAX, you want a type that will give you a compile-time or at worst run-time error.

reply

junon | karma 13069 | avg karma 3.09 · | 2021-09-08 04:51:43

The benefit of using signed integers these days is that compilers can utilize signed integer overflow to trap, which is helpful in debugging, since it's undefined behavior whereas unsigned overflow is well defined.

Symmetry | karma 18650 | avg karma 3.07 · | 2017-02-16 13:20:45+00:00

You'll be comparing against a lot of other things like offsets that aren't naturally unsigned and having everything be of the same type just tends to reduce the number of potential corner cases.

And it will be a bit faster for certain weird architectures since the overflow behavior of an unsigned is prescribed but signed overflow is undefined. But I really doubt anybody cares about that here.

reply

uecker | karma 417 | avg karma 1.01 · | 2023-10-09 00:10:59

The advantage of using signed types is that you can reliably find overflow bugs using UBSan and protect against exploiting such errors by trapping at run time. For unsigned types, wrap-around bugs are much harder to find and your program will silently misbehave.

barrkel | karma 34063 | avg karma 3.87 · | 2011-03-26 12:50:32+00:00

"There are other advantages to using unsigned types. For instance, it gives an explicit hint to the person reading the code about the range of the value."

This is, without doubt, the worst reason for using unsigned types, and it's the primary reason (IMHO) for the flaws in the C API that force you to use unsigned types unnecessarily. Unsigned types are not a documentation feature, and they are not merely an advert for an invariant; they are opting in to a subtly different arithmetic that most people are surprised by. It would be better to have a range-checked types, like Pascal, than to infect the program with unsigned arithmetic.

I find that most programs deal with values for their integer types with an absolute value of under 1000; about the only excuse for using an unsigned type, IMO, is when you must have access to that highest bit in a defined way (for safe shifting and bit-twiddling).

reply

kazinator | karma 30751 | avg karma 1.78 · | 2018-11-15 20:47:03+00:00

Unsigned types don't overflow; they reduce modulo a power of two. This is a predictable, reproducible behavior (albeit, unfortunately, not entirely portable due to the size of that power of two often being implementation-defined).

> Using signed arithmetic means that you can actually trap on overflow and catch these bugs (using fuzzing for instance).

The reason you can't do this for unsigned types is not simply that their modulo-reducing behavior is well-defined, but that it is actually exploited in correct code, which then leads to false positives if the behavior is trapped.

But the overflow behavior of signed types is also exploited.

Either one could be usefully caught with tools, if the tools can simply be told where in the program to ignore false positives. If I'm using unsigned types in foo.c, with no intention of relying on the modulo wrapping, I should be able to tell the tool to report all unsigned wrapping occurrences in just foo.c without caring what is done with unsigned types elsewhere in the program or its dependent libraries.

All that said, I believe unsigned types should be eschewed as much as possible because thy have an unacceptable discontinuity right to the left of zero. Suppose A, B and C are small positive integers close to zero, say all less than a hundred or so. Then given

   A + B > C

and knowing elementary school algebra, I can rewrite that as:

   A > B - C

But I can do that in the actual code only if the type is signed. I cannot do that if it is unsigned, because C might be greater than B, and produce some huge number. This happens even though I'm working with harmless little numbers less than around a hundred.

We should prefer to work with integers that obey basic algebraic laws, at least when their magnitudes are small, so that we don't have to think about "is that test phrased in the right way, or do we have a bug here?"

In any higher level language than C, we should have multi-precision integers. I no longer consider any non-C language viable if it doesn't. If I'm reading the manual for some new supposedly high level language and come across a statement like "integers are 64 bits wide in Fart, but a clumsy library called Burp provides bolted-on bignums", I stop reading, hit the back button and don't come back.

reply

lmm | karma 42440 | avg karma 1.91 · | 2017-06-01 03:05:34

I view extra primitives as very expensive because they add a whole bunch of extra cases in the language itself; I like small languages where as much as possible can be moved into libraries. I just don't see the use cases for unsigned ints as being widespread enough to justify having them in the language; I've worked across a number of industries and I think I've seen them used once (when fixing a bug in ffmpeg), whereas for-in and lambda are used absolutely everywhere. I'm not especially anti-unsigned; if I was designing a JVM-like bytecode I'd remove short, and perhaps even (single-precision) float and int as well.

kazinator | karma 30751 | avg karma 1.78 · | 2021-06-13 03:13:33+00:00

Why would anyone do a silly thing like work with integers smaller than the register size of the machine?

Yet, even C gets this sort-of right, in an obsolescent way. When integer types narrower than int are manipulated in expressions, they promote to either int or unsigned int (whichever can represent the original type's value range).

So for instance, this doesn't overflow, unless sizeof(char) == sizeof(int).

   char mid, high, low;

   ...

   mid = high + low / 2;

Reason being that this is actually doing something like:

   mid = (char) (((int) high + (int) low) / 2);

(When sizeof(char) == 1, we are almost certainly on some DSP chip, where char is wider than 8 bits possibly as much as 32. Portable code using unsigned for array indexing would not assume that it goes beyond 255.)

It would be useful in a C like language if the declaration of a custom integer type could separately specify promotion semantics (possibly that it has none).

I would make it the default behavior that all integer values promote to the widest available type in their signedness. (E.g. the thing that corresponds to intmax_t or uintmax_t in C).

If someone wanted to optimize around that (due to the widest type not being the fastest, say), they could declare a custom type that promotes to something narrower, or doesn't promote.

reply

lallysingh | karma 9521 | avg karma 2.37 · | 2017-06-12 19:06:02+00:00

Compilers can optimize for signed integers better. Overflow/underflow on signed integers is undefined behavior, which is space for compilers to optimize. But unsigned ints are defined for all cases, so you get less optimal code.

Also, you have problems whenever you compare against signed ints.

reply

zajio1am | karma 2360 | avg karma 2.02 · | 2022-01-21 17:52:59

C already offers 'unsigned' type variants that offers defined overflow (together with non-negative range).

It would be useful to have another type variant that offers defined overflow (like in unsigned) together with signed range for such cases. But it still makes sense for basic integers to have overflow as UD, as in most cases it is not expected behavior.

Note that in current C, if one needs defined overflow on signed integers, one can cast them to unsigned, to the operation and cast result back to int. That makes it implementation-defined instead of undefined.

reply

lorenzhs | karma 5079 | avg karma 2.97 · | 2015-11-19 16:18:29+00:00

In C++, underflows of unsigned types are specified in the Standard. -1 is equivalent to the largest value your unsigned type can hold. This means that you can rely on this behaviour.

mangodrunk | karma 187 | avg karma 0.94 · | 2012-01-22 08:45:28

I'm not seeing where an unsigned type would help, other than it expanding your range. Can you give an example?

nly | karma 10886 | avg karma 2.88 · | 2014-08-24 10:45:18+00:00

On #1: Signed under and overflow in C and C++ is undefined behaviour and can do weird things. In combination with compiler optimizations especially, the compiler can simply assume it never happens. Using unsigned types means when someone passes you a negative int at least the result is well-defined and can be debugged when your program explodes.

Use unsigned types for unsigned values, and ensure all (new at least) code compiles cleanly with -Wconversion and -Wsign-conversion enabled, which should really be the default warning level these days.

reply

saagarjha | karma 56017 | avg karma 2.29 · | 2021-04-02 10:49:49+00:00

> Having signed and unsigned variants of every integer type essentially doubles the number of options to choose from. This adds to the mental burden, yet has little payoff because signed types can do almost everything that unsigned ones can.

Unsigned types are quite useful when doing bit twiddling because they don't overflow or have a bit taken up by the sign.

reply

gavinhoward | karma 5026 | avg karma 4.18 · | 2023-08-18 12:00:51

This is why I implemented two's-complement with unsigned types, that don't have UB on overflow.

cwzwarich | karma 2333 | avg karma 4.11 · | 2022-11-10 10:33:39

It's pretty sad that C/C++ finally accepted that arithmetic on contemporary computers is twos-complement but kept UB on signed overflow to ensure that compilers can continue to optimize loops with signed indices. If everyone used unsigned indices as the default for loops, then the UB could be eliminated with no real performance consequences.

MaxBarraclough | karma 10788 | avg karma 2.11 · | 2022-01-02 04:09:12

The article acknowledges this:

> There are some optimizations compilers can make assuming signed integers cannot underflow or overflow that unsigned does not get to participate in.

reply