Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Fq: Jq for Binary Formats (github.com) similar stories update story
1 points by philosopher1234 | karma 3005 | avg karma 3.16 2021-12-22 18:04:52 | hide | past | favorite | 84 comments



view as:

I am sure it is awesome but the name of this utility is somewhat unfortunate

This related project, on the other hand, embraced it (for better or for worse):

https://github.com/jzelinskie/faq


I’d have gone with bq myself

I know that jokes don't go over well here, but...

I'd make an improved one, one which is better. And I'd name it bbq.


And I will write a fuzzer for bbq called omgwtf.

D'oh! Big ol BigQuery CLI would like to have a word

then bbq.

Don’t see the issue. I naturally pronounce this “eff-queue”. You’ve really got to work hard to make it vulgar.

Yeah, two syllables is too long. Let's just sound it out...

... 'feek' ?

If it were 'fk' sure, but the Q on the end makes me think of all the English words that come from French and end in 'ique', like technique. 'fq' looks like 'feek' to me.


Only in puritan cultures

Or passive aggressive…

:) i didn't choose it to be provocative or so, apologies if that is the case. I've always pronounced jq yay-queue so fq is eff-queue for me. Also f and q can be written with one hand on qwerty which is nice and quick

Well fq too!

Relatedly, check out GNU Poke: http://www.jemarch.net/poke

Also Kaitai Struct:

https://kaitai.io/

And the other things mentioned in the fq README:

https://github.com/HexFiend/HexFiend https://github.com/binspector/binspector


It’s interesting how people work. Seeing that section of the readme with a laundry list of alternatives made me want to try fq even more. It tells you that the author actually cares about the problem space.

Hi! yes i'm very interested in binary analysis and decoders in general and fq was not built with the intention to compete or replace anything. I usually use fq together with lots of other tools, they all fill different purposes. The more the merrier!

The listing of alternatives in the README really should be standard practice for open source projects. OTOH, some maintainers don't like to do that when they haven't evaluated the projects. Perhaps they could still add them with a disclaimer though.

Nice! Some other tools and parsers: https://github.com/dloss/binary-parsing

Lots of tools and didn't know about, thanks

You may want to rename that awesome-binary-parsing, having awesome in the URL helps in some circumstances.

For something that is supposed to be an analog of jq, there is a notable omission from the list of formats: ASN.1.

note that the tool linked is for binary files. ASN.1 is text, isn't it?

I assume they meant the binary encodings of asn.1 like BER/DER.

And tangentially PEM, which is Base64-encoded DER.

That’s what I thought until recently, but it turns out that PEM refers to just base64 wrapped with ——BEGIN—— and ——END—— lines, and the encapsulated data does not have to be DER.

https://datatracker.ietf.org/doc/html/rfc7468


Can't blame him. ASN.1 is one of the most complicated binary formats, with so many encoding rules that there's no free decoder that can process them all.

Except for interoperability with existing systems, is there any reason why anyone would use this ASN.1 protocol/format?

What's it good for?


In my opinion, avoid ASN.1 if you can.

There's a reason why all the cool companies invented their own serialization formats: Google's Protobuf, Facebook's Thrift, etc.. even when ASN.1 had been an international standard for years: It's too complicated.


The big part of the reason is combination of NIH with bad reputation mostly related to X503 and such rather than anything else - hard to advocate for it when the main library you can point to is OpenSSL, and most commonly known encoding is DER (which has certain implementation complexity, effectively being sorted BER, which has certain important value in cryptography).

Both Protobuf and Thrift evolved from RPC systems that possibly started out too simple for ASN.1, combined with above issue where good tools were probably commercial and expensive (FWIW, my experience also suggests that Thrift is shitty rpc system, compared even to Sun/ONC RPC, but maybe things changed)


ASN.1 is incredibly good for one use case: As a cautionary tale against design by committee.

No. It mostly exists so that people who haven't tried to use it can tell other people that they should have used ASN.1.

BER/DER/PEM encodings are mostly quite simple, have very few subtle details.

Yeah, those are the easy one. I worked in Telecom, and dealing with unaligned PER is a PITA.

Shameless plug, but you may be interested in my library (which is MIT/Apache-2.0) that offers decoding from BER/DER/CER all from a single model in code, there's no UPER/APER support at the moment, but it's coming in the next few months. :)

https://github.com/XAMPPRocky/rasn


You mean DER, not ASN.1. ASN.1 is just notation for describing data. There are many encoding rules for ASN.1, including ones based on XML and JSON.

Hi, here is issue related to this where i explain a bit what would be required https://github.com/wader/fq/issues/20 and how protobuf support currently works.

this is quite an interesting project! combining kaitai structs or similar with the command line.

however, i am a little disappointed that the jq syntax was chosen. jq has a very non-intuitive syntax. there are more intuitive query syntaxes out there. (linq or even basic sql come to mind.)


Linq is a bit more wordy, and SQL is sadly not very composable.

Yes i can empathize with finding jq hard to understand, it's quite different and took a while to grasp. The reason i choosed it anyway was that after prototyping some common type of queries i would like to do (basic value access in deep structures, multiple recursive traverse with filtering, transform objects and arrays) in various languages, jq was more or less then only one that felt terse enough. Also i think it's quite nice that you can output to JSON and then load into whatever language or environment you want Maybe there are some alternatives i should look at?

> This project would not have been possible without itchyny's jq implementation gojq.

Another approach is to take the convert binary to object part of your code, output that as JSON on stdout and feed that into jq.

Basically, a binary front end + jq = fq


I wrote a small script to convert CSVs to JSON strictly to use jq on the output. Querying things like your GCP bill with jq is quite enjoyable.

gojq is also nice. I work with a lot of structured logs and wrapped jq with a little bit of format-understanding and output sugar to make looking at and analyzing such logs an enjoyable experience: https://github.com/jrockway/json-logs


> I wrote a small script to convert CSVs to JSON strictly to use jq on the output

Note that you can use jq to consume simple CSVs (and produce them) without anything else. There’s an entry in the cookbook wiki https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-f... - I posted some usage examples a few months back https://news.ycombinator.com/item?id=27379423


Miller Csv can process json in record format and has a much saner DSL in my experience.

https://github.com/johnkerl/miller


I wish this was only the binary front end so I could pick my parser (e.g. PowerShell). I see fq seems to support sending the whole JSON to stdout; I wonder if there's a way to make this the default behavior:

    # JSON for whole file
    fq tovalue file

It would be hard to get the full fq functionality that way. How would you encode the data in a way so that you can do both:

    .frames[100].header.sample_rate
for the individual field and

    .frames[100].header|tobytes[:0x10]
for the first few bytes of the entire header structure?

Or decode a binary slice as a particular format:

    tobytes[0x234:0x325]|avc_sps.max_num_ref_frames

Hi, i wrote a bit about this in my reply above https://news.ycombinator.com/item?id=29661575

why not bq?

So is this meant for any binary formats?

Of course not. It can only work on formats that the team head written parsers for.

I ask because I’d be interested in helping write an EMF+ filter

Supported formats:

https://github.com/wader/fq/blob/master/doc/formats.md

They should probably make this a bit more prominent. It's an impressive list for a new project.


Permanent link (press 'y' on any Github link): https://github.com/wader/fq/blob/eb4a6fdbd6ef3a09fc59802e96e...

Thanks! any suggestion how to make it more prominent?

It's interesting to see how they introduce a new binary format in their catalogue. I was expected to find a domain specific language to define the grammar of binary bitstreams, maybe as a context free grammar. Instead, they built a nice library of routines that helps them design custom parsers by hand for each new format.

I wonder if it would support also non-binary formats. The tool could evolve to handle json, yaml, xml, ini etc...

Hi, fq actually do support json so other similar text formats could work the same way. But it's currently implemented in a big hacky way, it's just a big blob that happens to be work as normal JSON. I've done some attempts at implementing it as a normal fq decoder but it's hard to figure out a way to represent the whitspace between values etc, ends up very clunky or not very user friendly. Any suggestions are very welcomed.

Interesting project. Unfortunate that its name conflicts with one of nq’s executables (https://github.com/leahneukirchen/nq), but I’m not sure anything can be done about it.

IMO ones that only have one non-prefixed executable take precedence over ones that don't, even when the one with multiple non-prefixed executables is older.

Not to be confused with https://github.com/circonus-labs/fq, the message queue.

This looks incredible. I'm on my phone so I haven't tried this, but it looks like this supports slicing into MP3 bitstreams? That would have saved me a month of research and tons of development back in 2013.

Hi, it depends a bit, if the mp3 stream uses bit reservoir it might be tricky to to "pure" remuxing with any tool. fq:s mp3_frame decoder do try to know what bits are parts of the current frame or part of a future frame but not sure how much that helps. If the stream does not use reservoir you should be able to slice using fq '.frames[100:200][]' file.mp3 > sliced.mp3 or something similar.

It says it supports protobuf. Is there a protobuf file format i.e. for multiple records or do they mean a single protobuf record file?

Typically people separate protobuf messages by a length value of the next message. Perhaps that’s what they did.

Also protobuf can contain embedded messages or even just the binary representation of a list of embedded messages.


Hi, currently the protobuf support can either decode the wire-format or in some cases a format decoder uses protobuf as subformat and passes it a "schema" so it can do some more fancy decoding. But yes it would be interesting adding support for reading protobuf schemas somehow.

I wrote protobuf parser in ragel for work.. its still used to replace reflection as c++ protobuf implementation explodes our binaries to huge sizes.

I am alternating between WOW and wtf. Pretty cool stuff.

I just wonder how on earth you want to be able to support all the binary formats out there. I mean, jq supports json, not all structured text data, like json, xml, csv, ini, ...


apparently they went ahead and implemented a bunch of them in Go: https://github.com/wader/fq/tree/master/format

Hi, i can give some background how i ended up with go instead of using something more declarative. Maybe 1.5 years ago i start to prototype different approaches for what query language to use (sql, jsonpath, my own basic jq version and few more) and what language to implement decoders in (lisp, kaitai, tcl, "scripted" go, normal go and some more). What i found was that for my use cases, detailed parsing of big media files, anything scripted was just too slow. I did look into translating kaitai etc into something compiled which would probably be fast, but next on my list was i wanted was to be able to select and decode subformats in quite complicated ways (like mp4 samples), flexible ways of demux and join blob to decode, calculate checksums, samples counts in various way. All felt clunky or hard to fit into a purely declarative description. But i was also biased towards go as i had good experience using it and know that it would probably be fast enough (turn out smart memory usage is probably the main speed factor for fq when you keep track of lots of things). Also it would provide good tooling like IDE support, refactoring (gopls gofmt -r, rf) and it's a reasonably strongly typed language i think. Last but not least the quick build times really fits my way of working, usually use lots of watchexec etc. For query language i didn't prototype much, i know i really wanted jq as i had already used it extensively and know it was very powerful and had a terse syntax when working with structured data. I had some ideas of maybe using the C-version of jq via bindings or somehow let fq be tool that you used like this 'fq file | jq ... | fq' but it just felt strange and not very user friendly. Then i found gojq and i just felt that i have to make it work somehow, even if it would require lots of hard work and change to it (see https://github.com/wader/gojq/commits/fq, the JQValue change it probably to most interesting and support or custom iterators/functions that has been merged). And it turned out much better than i would expected, large parts becuse gojq's code is very nice and author has been very helpful. There is more things i would like that talk about but i think this is long enough for now :)

But all that said i think you could use kaitai or something similar together with fq:s decode API if you want. I also have some ideas and plans on supporting writing deocders in jq, hopefully will get some time for that next year.


Thanks for the extensive reply. I also had some good experience with Go so far, so I can understand how you came to that point ;-)

Honestly, it all makes sense: The plugin system and open source nature makes it really easy to write a definition for the file format you want to work on, which will not just leverage the whole ecosystem, but benefit everyone.

This is one of the seriously great ideas where I‘m thinking: How didn‘t anyone come up with that before?


https://github.com/tyleradams/json-toolkit

Convert json <-> xml, csv, yaml, logfmt

So to support all formats, you write a binary <-> json converter.


I’d like to see support for FIT files as they are emitted by Garmin fitness devices. It’s a clever binary format that in-stream defines the format of records which then contain the actual measurements which may be scaled for more compact representation. These multiple layers make the format not obvious to parse but the tool supports already an impressive list of formats that probably use similar techniques.

Does this support out-of-tree format decoders? From an initial glance it looks like all decoders are in-tree and written in golang. We have a lot of internal binary formats at $WORK that I would like to use this on...

Seems unlikely since the decoders are defined not just in-tree but in "host" code, the definitions are neither data-driven nor a DSL.

So it would require some sort of native (Go) plugins system, which I understand is about as bad as in Rust owing to there being no standard ABI (or plugins system for that matter).

therefore the way to have bespoke / internal formats would be to maintain an internal fork of the tool.


"(...) some sort of native (Go) plugins system (...)"

See: https://pkg.go.dev/plugin


Hi! yes it's kind of support but in a very go:ish at the moment. You can use fq as submodule, import/register your own format decoders and then run cli.Main. More or less what https://github.com/wader/fq/blob/master/fq.go does. I have private version of fq for work with some proprietary formats that does this and it works great. One issue is that the decoder and format API might change, not sure i can give any stability guarantees atm and i want to evolve a bit more. Also it would be great to be able to hook into existing formats more in some way.

In the future i hope to support writing decoders in jq and or support some declarative format like kaitai.


Hopefully json will be superseded someday. Cool tool.

Interestingly fq works by being kind of a superset of JSON/jq. It has types that can behave as jq values when needs but then with special functions or key accessors can be something else.

This looks like wireshark's panel for inspecting packets

Legal | privacy