Hacker Read

kevin_thibedeau · 2024-02-19 18:42:47

The efficient hardware types are handled by int_fast*_t. The legacy types can't be redefined outside their established ranges because that would break things that depend on them fitting into a known amount of memory.

lucian1900 | karma 4353 | avg karma 1.66 · | 2021-12-02 17:11:06

Unfortunately, variable length opcodes are a problem for wide superscalar machines, i.e. the fast ones.

nwmcsween | karma 1122 | avg karma 1.12 · | 2018-10-31 08:06:29+00:00

This is a feature of optimizing compilers and isn't what people mean when they say it maps to hardware.

tinus_hn | karma 7585 | avg karma 1.16 · | 2018-10-06 03:17:59

Due to all that legacy the instruction decoder has a lot less opportunity for optimizations.

There may be a physical limit to the size of transistors but the limit of computing performance is not ‘what Intel is doing’. There’s more to performance than instructions per second or clock frequencies. In fact these don’t lead to the same performance at all as ARM is RISC and x64 is CISC.

reply

gumby | karma 61183 | avg karma 3.86 · | 2019-04-18 15:54:00+00:00

fat pointer cost can be mitigated by doing pointer preunpacking in hardware. That kind of support fell out of favor when RISC started beating CISC architectures, but if dynamic dispatch became a significant performance issue I could see it going into certain processors. We have happily returned to an era of processor experimentation.

matthewmacleod | karma 17455 | avg karma 4.64 · | 2020-10-17 20:54:48+00:00

Let the hardware do best what it's good at, being simple and running fast. Let the interpreter/compiler layer do its thing best, flexibility.

Yeah, this is pretty much the opposite of what actually works in practice for general-purpose processors though – otherwise we'd all be using VLIW processors.

reply

ZiiS | karma 2139 | avg karma 3.37 · | 2023-06-07 02:39:21

Lots of speed sensitive programs also ship multiple implementations they can choose at run time so they can more fully utilize a CPU without recompiling.

h4b4n3r0 | karma 43 | avg karma 0.21 · | 2018-08-21 19:31:12

I haven't looked into it myself, but it could be that due to all this massaging you'd lose more on throughput than you gain on memory use. It's similar to doing 8 bit quantized stuff on general purpose CPUs. It's very hard to make it any faster than float32 due to all the futzing that has to be done before and after the (rather small) computation. In comparison, no futzing at all is needed for float32: load 4/8/16 things at a time (depending on the ISA), do your stuff, store 4/8/16 things at a time. Simple.

gpderetta | karma 12081 | avg karma 1.83 · | 2022-12-02 09:52:52

Backward compatibility. They could microcode all the lesser used instructions, but the surface area of existing code is very large, and intel and AMD care more about running existing code faster than new code.

There is a reason that even the obsolete x87 floating point stack still runs a near optimal speed.

Also I don't think it is very expensive to maintain most rare instructions. The cost is primarily in encoding space, but until they support a different ISA (possibly as an alternate mode), they don't have an option.

There is also the "small" advantage that a very complex architecture is hard to implement, validate, and/or emulate, giving an advantage against the competition.

reply

Gibbon1 | karma 7778 | avg karma 1.11 · | 2020-04-15 00:16:36

I once did a stupid test using either a int or unsigned in a for loop variable the performance hit was about 1%. Problem is modern processors can walk, chew gum, and juggle all at the same time. Which tends to negate a lot of simplistic optimizations.

Compiler writers tend to assume the processor is a dumb machine. But modern ones aren't, they do a lot of resource allocation and optimization on the fly. And they do it in hardware in real time.

reply

ChainOfFools | karma 2003 | avg karma 2.04 · | 2022-01-16 22:04:11

the only other option IIRC was Lattice C, and that was painfully slow as well. there's only so much 7.16 mhz can do.

hga | karma 12801 | avg karma 1.47 · | 2014-02-04 01:36:07

I mean hard to implement efficiently; that's still very possibly true on non-tagged hardware without custom microcode. But maybe not so we'd really notice outside of micro-benchmarks and extreme situations.

bsder | karma 16587 | avg karma 1.93 · | 2019-03-20 22:11:17

Unfortunately, fast array bounds checks and fast integer overflow checks are anathema to speculative execution so aren't happening any time soon.

I'm becoming convinced that we really need to just go back to the Alpha 21164 architecture and stamp out multiple copies with really fast interconnect.

reply

Someone | karma 30129 | avg karma 2.33 · | 2013-07-21 23:28:29

For examples of a CPU that behave differently, lok at RISC CPUs. 32-bit PowerPC, for example, would translate an immediate long load into an immediate 'load short into high word and zero out low word' and a signed immediate addition (it would load $DEAE first, then add -$4111 to get $DEADBEEF)

The list of problems is way longer, by the way. This code makes assumptions on pointer size (I don't think it will run on x64 with common ABI's)

There also is no guarantee that function pointers point to the memory where the function's code can be found (there could be a trampoline in-between, or a devious compiler writer could encrypt function addresses).

Neither is there a guarantee that functions defined sequentially get addresses that are laid out sequentially (there is no portable way to figure out the size of a function in bytes).

Finally, I don't think there is a guarantee that one can read a function's code (its memory could be marked 'execute only').

I guess those more familiar with The C standard will find more portability issues.

reply

adwn | karma 5737 | avg karma 3.64 · | 2016-03-15 08:37:10+00:00

"Plain sequential circuits" are also quite slow and therefore useless as accelerators alongside modern x64 CPUs.

rbanffy | karma 158565 | avg karma 2.97 · | 2019-06-14 18:49:40+00:00

Oh well... This brings me back memories.

The way it's achieved may not matter much with a 4 GHz multi-core CPU running a multitasking OS, but having to deal with 16-bit pointers and segmented memory in a 4.77 MHz 8086/8 was a huge pain I felt in the flesh.

Most compilers wouldn't even deal with that mess.

reply

oconnor663 | karma 6967 | avg karma 4.71 · | 2019-06-28 14:46:48

I feel like there's an underlying assumption/reality here that helps us: Hardware is designed for programs that are maximally efficient but also safe and correct. No one ever designs a new CPU instruction that's like "this multiplies two numbers super fast, but only if you're willing to accept undefined behavior 0.1% of the time." They only design hardware instructions that are possible to use in a safe, correct program. And so it's possible for a programming language to make progress on safety and correctness without nessarily compromising performance. The "laws of physics" as expressed in the way we build hardware allow for such a thing. That said, I wonder if other folks know stories about hardware designs that were fundamentally incorrect, and what happened to them?

andrewla | karma 8190 | avg karma 4.74 · | 2020-05-29 17:44:17

That might be another reason why it is slow -- it is an old opcode that has to be supported for compatibility, but isn't prioritized in any of the pipelines in newer chip designs.

Pet_Ant | karma 2569 | avg karma 2.39 · | 2023-04-03 15:27:15

They have many op codes that bloat the instruction set, that need to be broken apart to fit into a multi-scalar design. This is over head that takes of silicon space. This is especially painful with multi-core designs since each core needs this.

They may have memory models that make guarantees that are insignificantly secure or impede optimisation.

There is something about condition codes vs explicit checks that made speculative execution difficult. I don't recall the details about this one.

reply

salawat | karma 5305 | avg karma 1.0 · | 2020-11-16 15:41:04+00:00

Unfortunately, you lose that advantage because in days of yore, people knew the implementation details of their hardware, and would optimize around that. With microcode. and the sheer size of certain instruction sets, it is very difficult to do the same thing, particularly mow that the philosophy of programming/computing has shifted to accomodate the industry as an omnipresent actor in your execution environment, and furthermore one with even more rights to observable machine state than you, the owner of the damn thing.

Your point on inefficient code stands, but there is far more at play than mere "programmers aren't skilled enough" and "imagine the possibilities!"

I'll wager a true Mel would not be an easy thing to reoccur nowadays.

http://www.catb.org/jargon/html/story-of-mel.html

reply