I looked at the code... writing my own parser would be faster than fixing it. But that's beside the point. I decided to use a 3rd party library because I did not want to invest my time into that. The moment I had to even look at the source code, that was broken.
Your comment doesn't apply for this particular case, because the submission goes into great detail that the parser in question was written with Ragel, a parser generator. The code written by them in Ragel contained a bug, which lay uncaught and dormant for years, and manifested only when calling/wrapping code was altered.
I don't see how writing a parser from scratch would mitigate bugs vs using a regex parser. Parsers are just a hot spot for security bugs that should get extra scrutiny.
Awesome detective work there! Yes you are correct, the basics of the parser is that it is just replacing values for x and while I did a check for 10x to be 10x, I did not check for x10 to turn into x10.
Just pushed the bug fix, should work now! Hacker News is awesome, thanks :)
> Just write code to parse the language in a straightforward way.
This approach is why many consider parsing to be a solved problem, so it's certainly a valid approach. However, it's not the only valid approach.
For example, "straightforward" parsers often give terrible error messages: when the intended branch (eg. if/then/else) fails, the parser will backtrack and try a more general alternative (eg. a function call). Not only does this give an incorrect error (eg. "no such function 'esle'"), but it might actually succeed! In which case, the parser will be in the wrong state to parse the following text, and gives a non-sensical message (eg. "unexpected '('" several lines later).
This is an important problem, since these messages can only be decyphered by those who know enough about the syntax to avoid hitting them very often! Inexperienced users will see error messages over and over again, and have no idea that they're being asked to fix non-existent errors in incorrect positions.
It's being worked on. I am rather excited for the upcoming work that makes the parser replaceable and allows us to actually give good syntax errors! There is some discussion about making error printing more configurable so that one can skip stack-frames that are unlikely to be the cause (albeit that's a double-edged sword).
well anything that takes untrusted input that might need to be validated with parsing is a massive security hole. Parsing generators don't fix this class of bug, they just change how it manifests.
Not if you're actually parsing, with an explicit parse step and a well-defined language. Exploits tend to happen when parsing is done by the so-called "shotgun parser" - aka. various checks and conditionals randomly scattered throughout your code, that implicitly define an input language that's different from what you think it is.
Without having investigated it, I would guess that the parser isn't abstract and modular enough so they end up with a mess of code trying to handle all the different possible combinations of syntax.
Maybe he could have adapted an existing one, but I think the main issue was that parsers traditionally just quite when they find an error, language.js has some error recovery feature (naughty or) to proceed with parsing.
reply