It is more an expression of confusion rather than type system quality evaluation, but
why does JSON parser's implementation returns its own types like JsonNode? Why doesn't it just return core types like:
My takeaway from this is that you can submit a long string of brackets to REST apps built with Haskell and Ruby/Oj and crash the process and/or machine by consuming all available RAM. Whereas the smarter JSON processors near the top of the list will abort processing pathological inputs before serious harm occurs.
Wait, which JSON parsers are supposed to be "bad"?
Most (all?) the complaints here appear to be that specific libraries fail to implement the JSON spec in the way that the author has interpreted it. Some libraries try to 'help' by parsing things that they shouldn't, and some fail to parse things they probably should.
This is why we end up with so many JSON parsing libraries I guess, but it's not really a problem with the format itself, beyond the fact that clearer specs might disambiguate things and lead to less deviation.
Do you mean non-json types? Because the supported types seem pretty straightforward. (besides perhaps supporting null bytes in strings in things like postgresql)
> the syntax too quirky
Care to explain? This has always seemed like one of jsons strengths. The syntax for what is valid is pretty straightforward.
The JSON config is strange, the keys contain type information. But any JSON parser worth its salt should not require that since JSON is natively typed, no?
The post [1] (though old) - hints at no good libs for json (mentions a workaround for one he picks)
> For me, the most important quality
I need in a JSON library is an unambiguous,
one-to-one mapping of types. For example: some libraries
will deserialize JSON arrays as Lisp lists,
and JSON true/false as t/nil. But this means [] and false
both deserialize to nil, so you can't reliably round trip anything!
Although this post is from 2018, wondering if it has improved since then.
> There is a very big difference - "with JSON one still must check the expected types anyway" is not really true, I can deserialize an arbitrary json and I will know the difference between 123 and "123" even if I don't know what's expected or, alternatively, mixed-type values are expected.
You will still need, in your code, to handle both 123 & "123" (or handle one, and error on the other). That's really no different from, in your code, parsing "123" as an integer, or throwing an error.
In JSON one must check that every value is the type one expects, or throw an error. With canonical S-expressions, one must parse that every value is the type one expects, or throw an error. There's really no difference.
If one is willing to use a Scheme or Common Lisp reader, of course, then numbers &c. are natively supported, at the expense of more quoting of strings (unless one chooses to use symbols …).
There are no objects — it's all lists (or arrays, if you prefer). This is a good thing, because ordering is always the same.
> What's a keyword and what's a literal?
Everything's a byte string: any interpretation above that is a matter for the protocol. This is a good thing, because it means you neatly sidestep issues like JSON's numbers actually being floats rather than integers, as well as issues like numeric precision.
'But I want types!' Of course, they are good. But the type of a number isn't just integer or float: it's integer-which-is-a-known-version or float-which-is-between-zero-and-one-exclusive. You need to check those types at ingestion anyway; it's not a big leap from integer-which-is-a-known-version to string-which-parses-as-an-integer-which-is-a-known-version.
> I can barely even tell the heirarchy where it's very visually obvious with the JSON.
They are both indicated with indentation. Probably I should have used two spaces for the JSON indentation, but I was lazy.
> You receive some JSON over the wire, it's not the right format, and everything blows up in your face.
If I was using a staticly typed JSON parser, like say Rust's or Crystal's, then it would blow up, right there. It'd throw an exception, or pass back a failure value that I have to handle.
JS doesn't do that. It just keeps going, and due to the way it implicitly handles types, getting a list instead of an object can mean you end up with a string later on.
> I'm sure one reason so many people have particular trouble
Those two sentences don't really fit together. Elm isn't an academic experiment, which means people need to use it.
If the vast majority of people find it overly difficult, then it probably is.
> come from a dynamically-typed language where you'd just go `user = JSON.parse(string)`
Not just dynamic languages.
Haskell, one of Elm's inspirations, can handle it a hell of a lot easier.
Define an interface, then use it on a string:
data Coord = Coord { x :: Double, y :: Double }
deriving (Show, Generic)
let req = decode "{\"x\":3.0,\"y\":-1.0}" :: Maybe Coord
I am cheating a bit, as that's the aeson package... But decoding the JSON is a lot simpler than the Elm equivalent, and Haskell doesn't even target web browsers, where JSON is so prevalent.
But, here's another static example, this time, C++.
auto j3 = json::parse("{ \"happy\": true, \"pi\": 3.141 }");
You can use auto to avoid writing a massive type definition, or you can write it ensure it's correct.
Static typing has nothing to do with not being able to write JSON.parse(string), because it doesn't prevent that. At all.
>> Here's an example jsonschema (a bit handwritten so maybe some errors but it should be clear enough).
That'd be nice, but it's not how this tool works. If you look at the repo, there's an example of following a json schema or pydantic model. It's clear that if you wanted a "carrot dagger" in your json, you'd need to define it beforehand:
But perhaps I'm underestimating the tool's capabilities. If so, hopefully remilouf can correct me (and give an example of how the tool can be made to work as you want it).
>> I really do not think this is the case. Parsing and understanding arbitrary requests about something like this?
Not arbitrary. See my casting-to-type analogy. The point I'm trying really hard to get across is that generating free-form text is all nice and cool, but if you want to give it structure, you need to have the entire structure defined before-hand, otherwise the text that can't be made to conform to it simply won't.
So if you haven't got anthropomorphic vegetables in your json schema, your LLM may generate them, they'll never end up in your json.
Yeah, that's the problem.
I mean, hey, why json? We could just use unstructured plaintext for everything and now we are free to do everything. But obviously that has its own drawbacks.
Having built-in support for sumtypes means better and more ergonomic support from libraries, it means there is one standard and not different ways to encode things and it also means better performance and tooling.
Blink. I beg your pardon, JSON is strongly typed per RFC 8259 standard:
> JSON can represent four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays).
The type system does not align well with any other type system out there (float/int ambiguity, no timestamps, etc.) but it's still better than any coercion.
"My general opinion is that it's extremely hard to reliably use JSON as an interchange format reliably when multiple systems and/or parser implementations are involved."
I suspect one of the reasons that JSON has been so successful is precisely this fuzziness, though. Every language can do something a little slightly different and it'll work at first when you send it to somebody else. You get up and off the ground really quickly, and can fix up issues as you go.
If you try to specify something with a stronger schema right off the bat, I find a number of problems immediately emerge that tend to slow the process down. It may be foreign to programmers on HN who have embraced a strong static type mindset, or dynamic programmers who have learned the hard way that sometimes you need to be more precise about your types, but there's still a lot of programmers out there who will wonder why you're asking them whether this is an int or a float is relevant. I came in to work this morning to an alert system telling me that a field that a particular system has been sending as an integer for a couple of months now over many thousands of pushes, "number of bytes transferred", is apparently capable of being a float once every several thousand times for some reason. There's a lot of programmers who will send a string, or a null, or maybe a float, or maybe it's always an integer, and deeply don't understand why you care what it's getting serialized as.
And that's just an example of some of the issues, not a complete list. Trying to specify with some stronger system moves a lot of these issues up front.
(If your organization has internalized that's just how it has to be done, great! I bet you encountered a lot of these bumps on the way, though.)
This isn't a celebration of JSON per se... this is really a rather cynical take. I don't know that we need to type everything to the n'th degree in the first meeting, but "why can't we just let our dynamically-typed language send this number as a string sometimes?" is definitely something I've had to discuss. (Now, I don't get a lot of resistance per se, but it's something I have to bring up.) I'm not presenting this as a good thing, but as a theory that JSON's success is actually in large part because of its loosey-gooseyness, and not despite it, regardless of how we may feel about it.
> Given that we can't depend on people to do simple things like handling escape sequences correctly, that seems pretty important. (Also, the 'jq' command is nice.)
jq stores all numbers as doubles, which means it can't handle 64 bit ints (or larger integer numeric types).
It's otherwise a great tool and I wouldn't point out a negative such as this, but you did explicitly mention that you know JSON won't be fed to a bad parser.
This is a big problem for people writing general JSON processors/parsers.
But it's not too bad an issue for specific applications/systems using JSON...
They need their JSON to be in the correct form to represent their "business objects" (or whatever you want to call your application or system-specific data types), which is already a very restricted subset of JSON that a standard can't help with, and only rarely need to bump up against the oddness JSON has around the edges.
(Not that people won't bump up against these issues more than they really need to -- e.g, I recently saw someone trying to rely on multiple keys to mean something specific, which is a fun/interesting idea but is crazy to want to put into production... but good specs won't stop people from wanting to do crazy things.)
* JSON String -> NIM String
* JSON Object -> NIM Hash
* and so on...
I've also had the same question when looking at https://crystal-lang.org.
PS: I do not claim Nim's type system to be bad as I do not have an exact counterexample of a "better" type system.
reply