Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

a parser that is 97% correct is broken.


sort by: page size:

My parser seems to be broken.

Now that's a pretty big fail for a parser generator, as it should be at its best ESPECIALLY when the parse gets complicated.

The parser needs to be rewritten really.

Well, that is just choosing the semantically correct parse from multiple syntactically correct parses. This parser isn't even finding syntactically correct parses.

Nevertheless, the parser failing at only 100 levels of nesting is shockingly bad.

I looked at the code... writing my own parser would be faster than fixing it. But that's beside the point. I decided to use a 3rd party library because I did not want to invest my time into that. The moment I had to even look at the source code, that was broken.

Yeah, natural languages don't have a specification or canonical parser implementation, so they cannot be reliably parsed.

I don't think you actually disagree with the author. I think they would basically agree with everything you wrote and just add on, "therefore write your parser so it actually does recover correctly." Which is what most of the post boils down to.

I'm confused and very far from an expert here. What is wrong with parsers, and what is the alternative?

Tell me of some parsers that do not deal with deterministic inputs and have 100% accuracy, then.

The parsers available are good enough for English. Sadly, that's absolutely not true for other languages.

Good to see someone avoiding horrible parser generators, even if the code is ugly, poorly styled, and bug prone.

"In short, there are a few reasons that parsing is a mess, and none of those reasons are actually resolvable by parser generators."

I'm pretty sure this is untrue /and/ part of the problem. Build quality on these tools is appalling...


If four popular parsers all had serious bugs, 6 years seems not too shabby.

It's a shame parsers are such a PITA to write. So many problems could be trivially solved if writing a grammar and generating a parser for it were in any way a pleasant process.

Without having investigated it, I would guess that the parser isn't abstract and modular enough so they end up with a mess of code trying to handle all the different possible combinations of syntax.

They have a parser , not a entire compiler

the parser is usually really good at spotting those errors and once they've been pointed out are trivial to correct.

Just fuzz every parser you write. And the problem is solved.

So what you're saying is that, except for all the times you come back to work on the code because something broke, it works reliably? Nothing about the parser could be improved so that it doesn't break on data format changes? Nothing could be improved such that instead of alerting you to failures, it could be pro-actively adjusted to accept new formats before something fails?
next

Legal | privacy