Hacker Read

thaumasiotes | karma 22020 | avg karma 1.35 · 2022-11-21 11:24:11

>>>>> Why are people still using parsers for untrusted input in C?

No matter what the parser itself is written in, if you're writing in C you'll be using the parser in C.

nicoburns | karma 22847 | avg karma 3.29 · 2022-11-21 11:26:58

1. Well don’t write in C then if your program is security critical or going to be exposed over a network. Sure, there are some targets that require C, but that’s not the case for the vast majority of platforms running OpenSSL.

2. That’s still less of a problem as the C will then be handling trusted data validated by the safe langauge.

reply

jstimpfle | karma 3971 | avg karma 1.31 · 2022-11-21 11:40:48

If you make argument 2) could you explain how writing a parser is more security critical than any other code that has a (direct or indirect) interaction with the network? At least recursive descent parsers are close to trivial. I usually start by writing a "next_byte" function and then "next_token". You'll have to look very hard to find any pointer code there. It's close to impossible to get this wrong and I don't see how the fact that it's a parser would make it any more dangerous.

thaumasiotes | karma 22020 | avg karma 1.35 · 2022-11-21 11:46:54

> It's close to impossible to get this wrong and I don't see how the fact that it's a parser would make it any more dangerous.

I can answer that one. The parser is more dangerous because a parser, essentially by definition, takes untrusted input.

Nothing the parser does is any more dangerous than the rest of the code; it's all about the parser's position in the data flow.

reply

nicoburns | karma 22847 | avg karma 3.29 · 2022-11-21 12:23:58

Well if you're dealing with a struct then the compiler will provide type safety if say you try to access a field that doesn't exist. You don't get the same safeguards when dealing with raw bytes. Admittedly in C you can also run into these hazards with arrays and strings, which I why I suggest using non-standard array and string types which actually store the length if you insist on using C.

jstimpfle | karma 3971 | avg karma 1.31 · 2022-11-21 17:45:41

When a C program is factored well there needn't be all that much access by pointer + index. I'm not saying it can't be frequent in certain kinds of code, but for many things it's easy to just put a simple abstraction (API consisting of a few functions) that you have to get right once, then can reuse dozens of times.

Plain pointer access in high-level code (say when parsing a particular syntactic element by hand in a recursive descent parser) is a violation of the principle of separation of concerns IMO.

In any case I still don't see what's special about parsers. Most vulnerabilities I suspect to be in the higher levels, like validating parsed numbers and references, for a trivial example. In general, those are checks that are likely to be implemented much closer at the core of the application.

reply

nicoburns | karma 22847 | avg karma 3.29 · 2022-11-21 20:53:00

> Most vulnerabilities I suspect to be in the higher levels, like validating parsed numbers and references, for a trivial example. In general, those are checks that are likely to be implemented much closer at the core of the application.

What I see (especially in libraries like OpenSSL) is the core logic often receives a lot of scrutiny and testing, and thus it is silly mistakes with offsets and bounds checks that make up the majority of bugs.

It’s also worth considering the severity of different kinds of bug. A bug in high level logic might allow an attacker to do something they shouldn’t be able to do, but it doesn’t give them code execution.

The worst bit is, an attacker can often gain code execution through a part of the code that otherwise wouldn’t be security critical (where a logic mistake would be low impact). So writing code in a language that allows for these vulnerabilities greatly increases your attack surface.

reply

harshreality | karma 5620 | avg karma 3.79 · 2022-11-21 11:36:37

Even if application constraints mean you can't write a parser in another language that's linkable to C, why couldn't you use a parser generator that outputs C?

dllthomas | karma 14749 | avg karma 1.38 · 2022-11-21 11:40:44

I agree that the original statement encourages that interpretation, but I think it admits the interpretation that the parser itself is in C and I think that is what was intended.

wongarsu | karma 24397 | avg karma 4.14 · 2022-11-21 11:49:24

If you have the input in a buffer of known length in C, hand it off to a (dynamic or static) library written in a safe language, and get back trusted parsed output, then there's much less attack surface in your C code.

lazide | karma 9657 | avg karma 1.51 · 2022-11-21 14:45:34

The issue in many of these cases is there appears to be no canonical safe way to know the length of the input in C, and people apparently screw up keeping track of the lengths of the buffers all the time.

saagarjha | karma 56017 | avg karma 2.29 · 2022-11-22 06:11:01

This is why you reduce the amount of C code that has to keep track of it to as little as possible.