Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

In emacs those characters are by default visible as one-pixel wide spaces, to make them more apparent eval (update-glyphless-char-display 'glyphless-char-display-control '((format-control . empty-box) (no-font . hex-code))).


sort by: page size:

Coding editors also often show this kind of thing intentionally, as those characters are meaningful for interpretation purposes. Many of them are very UTF friendly, but they still show zero-width spaces as e.g. "<zwsp>" on purpose.

They've also often shown non-printable ASCII control characters for basically forever. Null bytes and \bel and whatnot are very important despite being "invisible", and they've been around for decades.


It looks like each character is taking up twice the width of an ASCII symbol, but half of that is empty space. Why is that? Is that space completely unusable?

from the article, its likely you'd not even notice - unless you pasted in an ascii only editor that doesn't allow anything other than plain old text.

It's actually unicode instead of ascii. It's full of special characters for boxes, and more for the function library.

But I get your point, you probably meant "fixed width text characters" instead of ascii. I'm just being pedantic :)


Let's not be so pedantic. I meant visible ASCII symbols, character codes 32-126 plus newline.

I'll trade editor awareness of control chars for eliminating the problem in 99.999% of cases.

Also some editors do show the ascii codes, like notepad++.


Emacs has handled literal ASCII control characters correctly I believe since around the time I was born - probably somewhat earlier, if we count back further than GNU.

Unicode works fine there too, so it makes no nevermind to me which flavor people use. I just think it's funny how "everything old is new again".


Have you ever seen the ASCII separator characters used as they were intended? I don't think I have. It's obvious the problem they were trying to solve, but it was too little too late. It doesn't help that they're control characters that aren't meant to be displayed so they're practically invisible.

What a bizarre choice. If they're going to commit to weird ASCII control chars you'd think they could just use 0x1C to 0x1F, which are explicitly intended as delimiters/Separators... sigh. (I've always wondered why more people don't use the various Separators, but I admit human-readability is a big advantage)

For displaying 0-31 in programs and scripts of my own, I add 64 and display that in inverse video.

The resulting glyph is the letter from the matching CTRL or ^ notation for that byte, but in a single character cell, and still distinct from a byte containing that letter.

So for instance, a NUL is value 0, which is CTRL+@ or ^@

But displaying ^@ screws up formatting, and displaying @ collides with byte value 64. Inverse video @ solves both, and doesn't need any special font. I do the same for DEL which is 127 displayed as inverse ?, but the ? is meaningful and adheres to the same rule because it's literally ^? not a placeholder for "no glyph" or "non-printing control byte"

Doesn't help you with configuring an editor but just describing a way to display those undisplayable bytes in a way that is actually meaningful & unambiguous and without caring what the font or even terminal type is. (ei: works the same in BASIC on a TRS-80 Model 100 or in bash on a xterm, or in c on windows, etc).


In some languages which allow non-ASCII but aren't Unicode-aware (PHP, for instance), you can add significant, invisible zero-width spaces to identifiers.

ascii using cleartype seems to work fine for me, I don't see the issue.

Why not just use ASCII 31/30 instead of comma/new line. Just give it a new name like ASV (ASCII separated values). What would take to give glyphs to those characters? Just add them to some font?

You are right, it is just ASCII text. Probably a brain fart there, sorry. I may have said that (hex editor) out of sheer habit of using one to inspect various formats. Or I may have used a text editor, if so TextPad, IIRC, since the project was on Windows (at least dev env was). It was years ago, so not sure.

They stick to plain ASCII and the same font everywhere to reduce metadata leakage.

Zero-width unicode chars have been used in exploit kits for a while now; just use hd (or something similar) when debugging.

I use a DOS text editor for this, where no Unicode support is an advantage. The majority of the time I'm dealing with plain ASCII anyway.

Even ASCII-only forums and mediums have come up with informal markups for features like *bold*, _underline_, SPEAKING LOUDLY, etc. You might be so accustomed to these that you dont’t even see the codes anymore, but they are codes, and quite ugly too.

But he's not modifying the code, he's just changing one ASCII char for another. And if you take a look at the hex editor screenshot, you'll see that it's not even inside the code, but rather surrounded by the menu entry names.
next

Legal | privacy