Yes, even I ran the tests and noticed printing the double in snprintf() is the bottleneck Any idea how we can do that faster and we get the format we need in `nnn`?
Even then, printf and scanf are typically faster (and not even by a little bit, by a lot) than C++ iostreams formatted output, even though iostreams gets all the formatting information at compile-time, while printf has to parse the format string.
On the other hand, if people start to use snprintf in that particular form as a safe way of string copying, compilers could pattern-match this and substitute a direct implementation.
Excellent response! I have a program that spends 10% of it's time in vfprintf for string processing, and I really think the program should not be spending that much time/any time there. I looked at the libc6 vfprintf implementation and it's pretty esoteric looking stuff. It might be worth my time using your library or something similar to swap it out.
The biggest performance trap they had was copying all their strings in a really hot loop to a vector of characters. I'm not sure what we could do to steer people away from that...
1. nothing generic is as fast as making your own custom solution
2. a lot of libc is lowest common denominator / tons of bloat. printf/sprintf probably does extra locale, multibyte charset, and thread locking shit you don't want.
That surprises me. If you had asked me, I'd have said that with normal optimizations switched on, both programs would boil down to exactly the same single function call that copies a byte-array to the stdout stream. I would have to wonder if there any optimisations someone has missed if there's even a tiny difference.
In the C world, I could image there's a loop at runtime scanning the printf string for % characters, but I would equally imagine that the compiler people have made a special case for printf that silently replaces printf calls with a single string literal parameter with no %s are replaced with a puts call. (Which itself gets optimised to to a byte-array copy.)
I just did a test using std:: and old-fashioned char[]. The fancy std:: is fifteen times slower than the strcat(). In a loop with intensive string manipulation, this could cause the program to get back to you in 15 seconds instead of 1 second. You don't mind waiting?
fast_io gives 15% perf improvement by replacing a safe format_int API from https://github.com/fmtlib/fmt with a similar but unsafe one + 50kB of extra data. Adding safety will likely bring perf down which the last line seems to confirm:
This shows that fast_io is slightly slower than the equivalent {fmt} code. Again this is from the fast_io's benchmark results that I hasn't been able to reproduce.
CSPRNGs are not much slower than the alternatives, and really, how often the hot path on data creation falls on the processor, instead of memory access or IO?
Nope, looks like a really bad choice for generic libraries.
Even the "optimized" C version is far from what an experienced C programmer would write if performance was paramount. General-purpose memory allocation, using hash tables with inherently bad spatial and temporal locality, using buffered I/O instead of mapping the file to memory.
I forget the exact reasoning now, but I remember it being about 10x slower than memcpy or strncpy. I think the main reason was because of the need to parse the format string.
reply