Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Isn't uint8 exactly what your 'integer[modulo 256]' is? And for unbounded you do need bignum and dynamic allocation so I'm not sure I see any benefits to explicitely fine-graining the range instead of using machine word at all times and bignums when needed


sort by: page size:

Good catch! I mostly used __uint128_t out of haste. Ideally, I would like to remove dependencies on large integer types. For now, this remains future work.

The problem isn't that signed or unsigned is the accepted default. The problem is that bounded integers are the default, and worse yet, that their behavior at the edges is counter-intuitive from a common sense perspective.

This was perfectly reasonable back when RAM was measured in kilobytes, and clock speed in megahertz. But these days, we write apps in HTML and JS, and package them with a browser to run. And, conversely, security is much more of a problem than it was decades ago - and integer overflow is a very common cause of security issues.

So, why aren't unbounded integers becoming the norm? I'm not saying we should throw int32 etc out, but rather treat them as low-level optimization tools - much like we treat, say, raw pointers in C# or Rust. Surely the default should be safety over performance?


I would have assumed that you don't want to use uint_fast16_t to store anything that doesn't fit in 16 bits.

If you cast to signed that can't represent the complete range that still wouldn't be called "overflow" AFAIK, and no matter what you call it it is not "undefined". And assuming 32-bit ints there is no loss of information given a 16-bit ints.

You can also just cast to unsigned or whatever type you think is enough (you should know). The point is, use a conversion, cast to a simple type, make your code compatible.


Easier to sanity check if you use uint8 for the indices

If you are using integers as, well, integers then there is relatively little benefit other than compatibility with data structures from other languages (e.g. wire protocols). However, unsigned integers are also the building blocks of large bit string structures i.e. bit structures that are not interpreted as integers per se. For these it is a bigger problem. It is not the range issue so much as you have a bit that behaves differently than the others e.g. for operations like comparisons.

For kernels that do a lot of manipulation on largish bit strings there are two ways you can deal with it: use some additional conditional logic that treats the sign bit as special or design your code so that it can accept "gaps" in the bit string.

The reason you might want to do direct unsigned 64-bit bit-twiddling on packed bit string structures is blinding speed. It fully utilizes your CPU's ALUs. The equivalent code when limited to signed integers injects branches, extra operations, and/or requires working in 32-bit chunks, all of which slow down the code significantly and makes it uglier to boot.

For most apps it doesn't matter. For some algorithms that work on large bit strings it makes implementation in Java more painful than it needs to be.


This mindset leads to a multitude of bugs and brittle non-portable code. Almost as bad as sticking pointers into a uint32_t and breaking portability off of 32-bit platforms. int is 16-bits minimum. If 2^15-1 isn't enough range, switch to long and stop writing brittle code.

You can just expand your example to use 16-bit values or switch to uint8_t. Bitfields with signed integers are also a minefield so it's best to never attempt it.

> Is the uint8_t just "no point in using something bigger" or does it likely help the compiler? Does/can the signedness matter as well as the size?

In a good world you could use just uint_fast8_t and compiler would optimize this question for you. In real world I don't think compilers are smart enough, or there are too many other constraints limiting them :(


The above assumes unsigned 32bit integers. It abuses integer overflow. Perhaps I should have stated that more explicitly.

In a strict typing environment, the other major issue is that int is cross-platform and forward compatible whereas uint32_t, uint64_t, uint8_t, uint16_t, etc. will all always be unsigned within a specified bound, so whenever we have 128-bit or 256-bit registers, we'll have to go back and update all this code that effectively "optimizes" 1 bit of information (nevermind the fact that int is usually more optimized than uint these days).

Furthermore, casting uintx_t to int and back again while using shared libraries is a huge pain in the ass and can waste a lot of programmer time that would be better spent elsewhere, especially when working with ints and uints together (casting errors, usually in the form of a misplaced parenthesis, are pretty small and can take a very long time to find).


Yes, that's the point. Signed ints are a potential footgun, which can be partially mitigated by limiting the range to 0 - 2^31-1.

Good specs pre-emptively mitigate implementation bugs.


No, it has fewer-than-32-bit fixnums. Integers are practically unbounded. Also, on cmucl and it's derivatives you can declare something to be (unsigned-byte 32) and when possible it will use untagged integers for performance.

This makes a difference because if you declare something "fixnum" you know you're being non-portable, but if you declare something (unsigned-byte 32) or just integer then you expect it to work across all implementations.


I feel like doing arithmetic with unsigned integers is like doing all your coding 2 feet from a cliff, in return for more land about 2^63 miles away. It's fine as long as you never make the tinyest step wrong, and its very unlikely you'll ever need that other land (and it's usually easy to design code not to need it).

This doesn't apply if you are using unsized for bit twiddling, but then you shouldn't be using minus anyway.


For one, if you write uint_least16_t, you're implicitly warranting that the code does the right thing if it isn't 16 bits, which means you have to think about it. And C's integer conversion rules and overflow restrictions are typically quite a lot to think about already... Not the strongest argument, but I think there is a case for applying YAGNI.

Also, it's less typing. :)


Personally, I struggle to see what's good about unspecified-but-fixed-width integer types as in C/C++. A lot of real-world code I see uses uint32_t (sometimes typedef'd to u32) and the like to get predictable data layout and to avoid cute gotchas like long being 64-bit on x86_64 Linux but 32-bit on Windows.

I also like how Rust makes both signed and unsigned types equally easy to type, because I feel that a lot of people use signed integers where they should be using unsigned simply because it's easier to type "int" compared to "unsigned (int)". And if you absolutely need a machine word size dependent type in Rust, you do have usize, which is the equivalent of size_t.


Unsigned types have an absolute advantage for unsigned values: they have twice the range. This turns out to have useful performance advantages e.g. in large data infrastructure software. In typical C++ metaprogramming for this type of software, the integer types for processing code paths are often generated using the most compact primitive type because it is expected that billions of instances of these types may be in memory and trillions in storage, and there are multiple integer types where the required range of values for each are interdependent. A considerable amount of design in database engines, for example, goes into maximizing scalability while minimizing the total footprint of integer type instances. This is a function of the expressible non-negative range of the types, and cutting it in half will cause some of the types to be promoted, doubling their footprint. It is difficult enough to squeeze them into unsigned types without a step change in integer size as it is.

That overflows are well-defined, in contrast to UB, has value in cases of bugs because the consequences of the bug will be well-defined and less subject to clever compiler optimizations.


Why oh why do people want a value to have different bounds depending on which system it is used? This is a source of huge confusion and why people stick to uint8 and other precise types

Exactly. In fact, I would go far as to make this the _only_ integer type. u16, u32 etc can just be typedefs.

The ability to be precise with integer ranges could prevent many types of arithmetic and OOB errors.


I use long when I need an int that can always represent ranges in -2^31-1 to 2^31-1, no assumptions necessary.

Same goes for unsigned versions and long long.

next

Legal | privacy