Hacker Read

saagarjha · 2023-04-06 20:36:01

snprintf has very similar performance pitfalls.

rurban | karma 5612 | avg karma 0.84 · | 2019-01-28 07:04:16+00:00

snprintf is the same trash, just slower. See eg. http://blog.infosectcbr.com.au/2018/11/memory-bugs-in-multip... discussing the need of an improved scnprintf in the Linux kernel.

x4e | karma 152 | avg karma 2.24 · | 2021-07-31 02:10:08

Why is snprintf slow? I am surprised that it would be slow especially when compared to methods like asprintf that allocate the buffer.

nly | karma 10886 | avg karma 2.88 · | 2014-12-09 14:24:30+00:00

Yeah, but as soon as you use snprintf you're throwing performance to the wind.The idiomatic C++ solution should beat it hands down. (gulp)

masklinn | karma 65147 | avg karma 3.36 · | 2022-11-28 13:01:39

If you really need a fast strcpy then probably not, but in most situations snprintf will do the job just fine. And will prevent heartache.

apjana | karma 661 | avg karma 1.5 · | 2019-05-09 00:19:53+00:00

Yes, even I ran the tests and noticed printing the double in snprintf() is the bottleneck Any idea how we can do that faster and we get the format we need in `nnn`?

formerly_proven | karma 13110 | avg karma 3.44 · | 2021-07-31 03:05:00

Even then, printf and scanf are typically faster (and not even by a little bit, by a lot) than C++ iostreams formatted output, even though iostreams gets all the formatting information at compile-time, while printf has to parse the format string.

On the other hand, if people start to use snprintf in that particular form as a safe way of string copying, compilers could pattern-match this and substitute a direct implementation.

reply

hxchen | karma 37 | avg karma 3.36 · | 2018-06-06 19:33:46

Reason: performance.

If you just do string manipulation, memncpy is faster.

If you need to convert data type, like int to string, then use snprintf.

reply

barbegal | karma 2879 | avg karma 5.05 · | 2022-08-27 02:23:25

Printfs can be slow but their performance varies by implementation and may have no meaningful performance implications in many cases.

throwawayaway | karma 345 | avg karma 0.55 · | 2014-03-20 17:14:43

Excellent response! I have a program that spends 10% of it's time in vfprintf for string processing, and I really think the program should not be spending that much time/any time there. I looked at the libc6 vfprintf implementation and it's pretty esoteric looking stuff. It might be worth my time using your library or something similar to swap it out.

adgjlsfhk1 | karma 4911 | avg karma 2.11 · | 2022-01-28 23:01:18

The biggest performance trap they had was copying all their strings in a really hot loop to a vector of characters. I'm not sure what we could do to steer people away from that...

penguindev | karma 233 | avg karma 1.03 · | 2014-10-08 01:28:05+00:00

1. nothing generic is as fast as making your own custom solution

2. a lot of libc is lowest common denominator / tons of bloat. printf/sprintf probably does extra locale, multibyte charset, and thread locking shit you don't want.

reply

billpg | karma 5306 | avg karma 3.92 · | 2022-08-09 15:00:22

That surprises me. If you had asked me, I'd have said that with normal optimizations switched on, both programs would boil down to exactly the same single function call that copies a byte-array to the stdout stream. I would have to wonder if there any optimisations someone has missed if there's even a tiny difference.

In the C world, I could image there's a loop at runtime scanning the printf string for % characters, but I would equally imagine that the compiler people have made a special case for printf that silently replaces printf calls with a single string literal parameter with no %s are replaced with a puts call. (Which itself gets optimised to to a byte-array copy.)

reply

GnarfGnarf | karma 1579 | avg karma 2.22 · | 2014-08-16 22:20:19+00:00

I just did a test using std:: and old-fashioned char[]. The fancy std:: is fifteen times slower than the strcat(). In a loop with intensive string manipulation, this could cause the program to get back to you in 15 seconds instead of 1 second. You don't mind waiting?

    Here's the code (Visual Studio 2010):

    #include <string>
    #include <time.h>

    char buf[64], buf2[64];
    clock_t start = clock();
    for(i = 0; i < 100000; i++)
    {    _snprintf(buf, sizeof(buf), "%d", i);
        _snprintf(buf2, sizeof(buf2), "%d", i * i);
    #if defined FAST
        strcat(buf, buf2);
    #else
        std::string s1 = buf;
        std::string s2 = buf2;
        std::string s3 = s1 + s2;
    #endif
    }
    _snprintf(buf, sizeof(buf), "%.4f\n", (float)(clock() - start) / (float)CLK_TCK);
    OutputDebugString(buf);

vitaut | karma 668 | avg karma 4.48 · | 2020-05-26 14:40:55

Unfortunately performance claims appear to be bogus.

1. ospan, performance claims seem to be based on, doesn't do any bound checks, so you can easily get buffer overflow.

2. fast_io generates a whopping 50kB of static data just to format an integer.

So if these benchmark results are correct (I was not able to verify because the author hasn't provided the benchmark source):

> format_int 7867424 ns 7866027 ns 89 items_per_second=127.129M/s

> fast_io_ospan_res 6871917 ns 6870708 ns 102 items_per_second=145.545M/s

fast_io gives 15% perf improvement by replacing a safe format_int API from https://github.com/fmtlib/fmt with a similar but unsafe one + 50kB of extra data. Adding safety will likely bring perf down which the last line seems to confirm:

> fast_io_concat 7967591 ns 7966162 ns 88 items_per_second=125.531M/s

This shows that fast_io is slightly slower than the equivalent {fmt} code. Again this is from the fast_io's benchmark results that I hasn't been able to reproduce.

50kB may not seem like much but for comparison, after a recent binary size optimization, the whole {fmt} library is around 57kB when compiled with `-Os -flto`: http://www.zverovich.net/2020/05/21/reducing-library-size.ht...

The floating-point benchmark results are even less meaningful. They appear to be based on a benchmark that I wrote to test the worst case Grisu (https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/p...) performance on unrealistic random data with maximum digit count. fast_io compares it to Ryu (https://dl.acm.org/doi/pdf/10.1145/3192366.3192369) where maximum digit count is actually the best case and the performance degrades as the number of digits goes down. A meaningful thing to do would be to use Milo Yip's benchmark instead: https://github.com/miloyip/dtoa-benchmark

reply

solveit | karma 3130 | avg karma 2.99 · | 2021-02-10 06:28:35+00:00

Sure, but how often is string formatting the bottleneck on performance?

marcosdumay | karma 27273 | avg karma 1.67 · | 2015-11-26 16:12:31+00:00

CSPRNGs are not much slower than the alternatives, and really, how often the hot path on data creation falls on the processor, instead of memory access or IO?

Nope, looks like a really bad choice for generic libraries.

reply

bluetomcat | karma 3230 | avg karma 3.81 · | 2021-03-15 12:08:16+00:00

Even the "optimized" C version is far from what an experienced C programmer would write if performance was paramount. General-purpose memory allocation, using hash tables with inherently bad spatial and temporal locality, using buffered I/O instead of mapping the file to memory.

eesmith | karma 7445 | avg karma 1.06 · | 2017-12-24 02:07:23

FWIW, in a file with 1,000,000 lines, the best-of-3 time for "less filename > /dev/null" is:

  2.508u 0.152s 0:02.66 99.6%	0+0k 0+0io 0pf+0w

and the best-of-3 time for "less -N filename > /dev/null" is:

  2.568u 0.159s 0:02.73 99.2%	0+0k 0+0io 0pf+0w

That is, it doesn't seem like printing sequential is the limiting factor in performance.

Checking the source code, it does not appear to use knowledge of the previous output index in order to save time. The relevant code is:

        static int
  iprint_linenum(num)
        LINENUM num;
  {
        char buf[INT_STRLEN_BOUND(num)];

        linenumtoa(num, buf);
        putstr(buf);
        return ((int) strlen(buf));
  }

where

  #define TYPE_TO_A_FUNC(funcname, type) \
  void funcname(num, buf) \
          type num; \
          char *buf; \
  { \
          int neg = (num < 0); \
          char tbuf[INT_STRLEN_BOUND(num)+2]; \
          register char *s = tbuf + sizeof(tbuf); \
          if (neg) num = -num; \
          *--s = '\0'; \
          do { \
                  *--s = (num % 10) + '0'; \
          } while ((num /= 10) != 0); \
          if (neg) *--s = '-'; \
          strcpy(buf, s); \
  }
  
  TYPE_TO_A_FUNC(linenumtoa, LINENUM)

mlindner | karma 2855 | avg karma 0.76 · | 2021-07-31 02:32:42

I forget the exact reasoning now, but I remember it being about 10x slower than memcpy or strncpy. I think the main reason was because of the need to parse the format string.