Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I assume given the rest of the context you were not implementing a FIX parser - but you’d find the same escape code layering violations if you were!


sort by: page size:

I looked at the code... writing my own parser would be faster than fixing it. But that's beside the point. I decided to use a 3rd party library because I did not want to invest my time into that. The moment I had to even look at the source code, that was broken.

Your comment doesn't apply for this particular case, because the submission goes into great detail that the parser in question was written with Ragel, a parser generator. The code written by them in Ragel contained a bug, which lay uncaught and dormant for years, and manifested only when calling/wrapping code was altered.

Good to see someone avoiding horrible parser generators, even if the code is ugly, poorly styled, and bug prone.

"In short, there are a few reasons that parsing is a mess, and none of those reasons are actually resolvable by parser generators."

I'm pretty sure this is untrue /and/ part of the problem. Build quality on these tools is appalling...


I don't see how writing a parser from scratch would mitigate bugs vs using a regex parser. Parsers are just a hot spot for security bugs that should get extra scrutiny.

Awesome detective work there! Yes you are correct, the basics of the parser is that it is just replacing values for x and while I did a check for 10x to be 10x, I did not check for x10 to turn into x10.

Just pushed the bug fix, should work now! Hacker News is awesome, thanks :)


> Just write code to parse the language in a straightforward way.

This approach is why many consider parsing to be a solved problem, so it's certainly a valid approach. However, it's not the only valid approach.

For example, "straightforward" parsers often give terrible error messages: when the intended branch (eg. if/then/else) fails, the parser will backtrack and try a more general alternative (eg. a function call). Not only does this give an incorrect error (eg. "no such function 'esle'"), but it might actually succeed! In which case, the parser will be in the wrong state to parse the following text, and gives a non-sensical message (eg. "unexpected '('" several lines later).

This is an important problem, since these messages can only be decyphered by those who know enough about the syntax to avoid hitting them very often! Inexperienced users will see error messages over and over again, and have no idea that they're being asked to fix non-existent errors in incorrect positions.


I'm confused and very far from an expert here. What is wrong with parsers, and what is the alternative?

Well, then, you can rest knowing that other parser writers came to the same workaround as you did!

It's being worked on. I am rather excited for the upcoming work that makes the parser replaceable and allows us to actually give good syntax errors! There is some discussion about making error printing more configurable so that one can skip stack-frames that are unlikely to be the cause (albeit that's a double-edged sword).

well anything that takes untrusted input that might need to be validated with parsing is a massive security hole. Parsing generators don't fix this class of bug, they just change how it manifests.

Not if you're actually parsing, with an explicit parse step and a well-defined language. Exploits tend to happen when parsing is done by the so-called "shotgun parser" - aka. various checks and conditionals randomly scattered throughout your code, that implicitly define an input language that's different from what you think it is.

Is it a bug though? Sounds like a conscious design limitation in the parser. A bad limitation, but not a bug. Sounds like it's behaving "as expected".

Without having investigated it, I would guess that the parser isn't abstract and modular enough so they end up with a mess of code trying to handle all the different possible combinations of syntax.

Why there's so much parsing related exploits?

My parser seems to be broken.

Maybe he could have adapted an existing one, but I think the main issue was that parsers traditionally just quite when they find an error, language.js has some error recovery feature (naughty or) to proceed with parsing.

That’s not like what was said above. They said that a strict parser would choke on unrecognized tags, thus making experimentation non-viable.

Sloppy programming is not about enabling new syntax at all. That simile is not useful.


> I haven't found a parser generator that makes it painless to provide good error messages.

This. And letting the user add their own infix function with configurable precedence and associativity is easy using Pratt too.


a parser that is 97% correct is broken.
next

Legal | privacy