They are both technical, but 'order by bytes desc' has got to be more expressive than 'sort -nr'. It's almost natural human English, whereas the latter doesn't express anything.
That said, I don't know how much time it would genuinely save. As with most of these tools, you shouldn't be installing them on production servers, so you still have to know Bash anyway.
Yes, I know that this is the correct way of doing it in bash. I posted this because someone might test the speed of the two scripts and conclude "bash is faster" while they actually measured the speed of "sort" probably.
I would also add: if it’s a one time thing, I would just do it in visual studio code or any other editor that doesn’t die on a 1GB file. And I have 12+ years experience with bash and unix tools, so it’s not about a lack of knowledge or experience.
There isn’t anything magical regarding “sort” versus another tool, if there isn’t a need for automation they are equivalent.
This is very memory intensive. Which may not matter if the data volume is small enough. But it is also a bit hard to understand, at least not so obvious at first sight. For most use cases sort -u would be ideal and way simpler to understand, if you don't mind having an ordered file at output.
I guess you wouldn't, you'd have a sort function afterwards.
...for Unix' insistence on composability, the shell tools are often unnecessarily monolithic, probably because that's the only sane way if the only type you have in interconnect is `string`.
In my experience there's a performance hit in filesystem-heavy work like opening and closing a lot of small files.
Still, its more than offset by the convenience of performing bash operations on Windows; remembering `du -csh ./* | sort -h` (sort directories by size) is easier than whatever Powershell would have me type.
I almost always prefer the GNU utils to the BSD ones. The GNU ones usually allow arguments in nicer orders. GNU `sort -h` can sort the human-sized output of GNU `du -h`. I've noticed tons of niceties from these sorts of tools on Ubuntu are completely missing from BSD based OSX.
I do a lot of bash scripting so I know `/usr/bin/sort` decently well. Using this technique I can use my bash scripting abilities to enhance my vim abilities.
I guess shell scripting and the standard commands are convenient interfaces to a lot more functionality. The interactive use (and pipes) tend to make the linear processing super terse and easy (but lots of other things become annoying).
In a type-safe language, it’s typically impossible to specify a function which can do as many things as (eg) sort can, and which doesn’t require specifying a huge amount of default arguments on every invocation. Therefore you end up with a limited function, a huge annoying function, or many small functions. In any case, changing from one kind of sort to another (eg how to sort things, what to sort, how to order them, whether to keep unique elements, whether to be stable, whether to take multiple inputs, whether to assume the inputs are sorted (so just merge), and so on) becomes difficult whereas with a tweak-in-many-ways command like sort it is easy. In a language like CL with optional arguments and no types, this problem can be less hard but whereas sort will likely work fone for a large dataset, your favourite library function may not. (In particular it almost definitely won’t use on-disk storage when needed or support parallelism. Partly note that storing objects from your favourite language to disk is probably hard but if you only work on lines of text it’s easy).
The other big difference is the data types. The data type of shell programs is basically a sequence of lines, each of which is sometimes broken up into one or more records by some other separator. In many languages the sequence operations are all about getting the nth element, or extending them, or changing elements, or deleting them, or splitting the sequence up into other sequences. And these operations typically identify the elements by the position. In shell scripts, many of these operations are unavailable or not used. One basically only iterates forwards, processing each element, and one almost never thinks about the position of the element in the sequence. (c)split exists but isn’t commonly used.
Lots of work in a typical language is converting one data type to another, or extracting bits of it, or doing data type-specific things. In shell these don’t really exist (eg you don’t sort as dates, you write your dates like 2019-09-28 and sort them like strings), and so data is simple, and because extraction is flexible (typically a field number or byte positions) lots of the bureaucracy of changing data types is omitted.
These things all often lead to shell scripts being unreliable. But also resilient, flexible and “sufficient”. By “sufficient” I mean that they can do things well enough that the cost for a long term, thorough, or “proper” solution isn’t justified.
Right, but sort is apparently clever enough to do a disk-backed mergesort on a file (according to timr's comment), but it doesn't get a chance to recognize that you're sorting a file if you just pipe in the data.
That said, I don't know how much time it would genuinely save. As with most of these tools, you shouldn't be installing them on production servers, so you still have to know Bash anyway.
reply