> What I'm interested to know is whether there is any code already out there in the wild with this exploit in it?
It's possible, but I doubt it. The paper mentions that Vim isn't vulnerable to the bidirectional attack. Not mentioned in the paper: neither is `less`, the pager, which is used by default for `git diff` and other Git commands. Nor are either of the first two terminals I tried, when `cat`ing the file without a pager.
All of the aforementioned programs display the direction markers as either escape sequences highlighted in bright colors, or garbage characters, both of which stand out visually like a sore thumb. Now, that's more a sign of poor Unicode support in those programs than it is anything to their credit. But it does mean that this kind of attack is incredibly brittle, at least in any codebase where some people working on it are likely to be using Unix tools. There's a high chance the aberrant characters will be spotted at some point or other.
And once spotted, it's self-evident that it's an attack. I suspect real attacks would try to be more subtle, introducing bugs that could pass as genuine mistakes, at least at first glance.
It's sad that largescale exploitation of this is stopped only because many applications still have really poor Unicode support and would therefore make the changes human-visible.
Coding editors also often show this kind of thing intentionally, as those characters are meaningful for interpretation purposes. Many of them are very UTF friendly, but they still show zero-width spaces as e.g. "<zwsp>" on purpose.
They've also often shown non-printable ASCII control characters for basically forever. Null bytes and \bel and whatnot are very important despite being "invisible", and they've been around for decades.
I've been bitten by things like this from an entirely unexpected angle - messengers like teams and skype sometimes <helpfully> replace characters like "-" and " " with all manner of more readable unicode characters. More readable, until the YAML parser choked.
Since that, I pretty much always run some variant of the gremlins plugin, which highlights pretty much all unicode spaces, dashes and other weird control symbols.
Chat apps replacing ™ with a horrifically large, poorly-rendered and off-colored "TM" and ruining The Joke™ is a major pet peeve of mine, yeah :| And even worse, it seems to be spreading, as each one blindly copies the horrible decisions of the others. I would disable all of those auto-replacements everywhere if only I could disable all of those auto-replacements everywhere.
I think making these chars human visible is a feature. Most code editors have features like showing invisible characters, displaying some representation of white space characters, or highlighting control sequences.
Because the editor is supposed to edit plain text, which means all characters must be editable. And something can only be editable if they are visible.
> Now, that's more a sign of poor Unicode support in those programs than it is anything to their credit.
But that behavior is intentional. If you want, you could do "alias less='less -r'", and then it would behave the way you want, and you'd become vulnerable to this attack.
-r makes it pass all control characters to the terminal. To quote less's man page:
> Warning: when the -r option is used, less cannot keep track of the actual appearance of the screen (since this depends on how the screen responds to each type of control character).
This is not the same as actually supporting (i.e. being able to keep track of the screen state for) bidirectional text that may legitimately use those characters.
For that matter, the terminal may not support it either, as I mentioned.
Though, today I learned there has been some effort in recent years to improve bidirectional text handling in terminals and terminal applications, generally:
It's possible, but I doubt it. The paper mentions that Vim isn't vulnerable to the bidirectional attack. Not mentioned in the paper: neither is `less`, the pager, which is used by default for `git diff` and other Git commands. Nor are either of the first two terminals I tried, when `cat`ing the file without a pager.
All of the aforementioned programs display the direction markers as either escape sequences highlighted in bright colors, or garbage characters, both of which stand out visually like a sore thumb. Now, that's more a sign of poor Unicode support in those programs than it is anything to their credit. But it does mean that this kind of attack is incredibly brittle, at least in any codebase where some people working on it are likely to be using Unix tools. There's a high chance the aberrant characters will be spotted at some point or other.
And once spotted, it's self-evident that it's an attack. I suspect real attacks would try to be more subtle, introducing bugs that could pass as genuine mistakes, at least at first glance.
reply