Hacker Read

westal · 2018-04-04 19:45:56+00:00

In emacs those characters are by default visible as one-pixel wide spaces, to make them more apparent eval (update-glyphless-char-display 'glyphless-char-display-control '((format-control . empty-box) (no-font . hex-code))).

Groxx | karma 17784 | avg karma 2.5 · | 2021-11-01 01:42:43

Coding editors also often show this kind of thing intentionally, as those characters are meaningful for interpretation purposes. Many of them are very UTF friendly, but they still show zero-width spaces as e.g. "<zwsp>" on purpose.

They've also often shown non-printable ASCII control characters for basically forever. Null bytes and \bel and whatnot are very important despite being "invisible", and they've been around for decades.

reply

yorwba | karma 12391 | avg karma 2.31 · | 2024-03-14 10:17:55

It looks like each character is taking up twice the width of an ASCII symbol, but half of that is empty space. Why is that? Is that space completely unusable?

captaincrunch | karma 871 | avg karma 2.39 · | 2021-11-01 09:18:38

from the article, its likely you'd not even notice - unless you pasted in an ascii only editor that doesn't allow anything other than plain old text.

Aardwolf | karma 6728 | avg karma 2.93 · | 2017-06-06 15:41:03+00:00

It's actually unicode instead of ascii. It's full of special characters for boxes, and more for the function library.

But I get your point, you probably meant "fixed width text characters" instead of ascii. I'm just being pedantic :)

reply

eterevsky | karma 1273 | avg karma 2.91 · | 2021-06-27 11:42:27+00:00

Let's not be so pedantic. I meant visible ASCII symbols, character codes 32-126 plus newline.

chapium | karma 1497 | avg karma 1.66 · | 2020-11-07 18:24:47+00:00

I'll trade editor awareness of control chars for eliminating the problem in 99.999% of cases.

Also some editors do show the ascii codes, like notepad++.

reply

throwanem | karma 18321 | avg karma 2.56 · | 2024-03-13 04:37:10

Emacs has handled literal ASCII control characters correctly I believe since around the time I was born - probably somewhat earlier, if we count back further than GNU.

Unicode works fine there too, so it makes no nevermind to me which flavor people use. I just think it's funny how "everything old is new again".

reply

mark-r | karma 5759 | avg karma 1.62 · | 2020-05-15 15:59:48+00:00

Have you ever seen the ASCII separator characters used as they were intended? I don't think I have. It's obvious the problem they were trying to solve, but it was too little too late. It doesn't help that they're control characters that aren't meant to be displayed so they're practically invisible.

ipdashc | karma 258 | avg karma 3.79 · | 2021-12-16 12:18:22

What a bizarre choice. If they're going to commit to weird ASCII control chars you'd think they could just use 0x1C to 0x1F, which are explicitly intended as delimiters/Separators... sigh. (I've always wondered why more people don't use the various Separators, but I admit human-readability is a big advantage)

Brian_K_White | karma 6250 | avg karma 1.82 · | 2024-01-02 14:26:10

For displaying 0-31 in programs and scripts of my own, I add 64 and display that in inverse video.

The resulting glyph is the letter from the matching CTRL or ^ notation for that byte, but in a single character cell, and still distinct from a byte containing that letter.

So for instance, a NUL is value 0, which is CTRL+@ or ^@

But displaying ^@ screws up formatting, and displaying @ collides with byte value 64. Inverse video @ solves both, and doesn't need any special font. I do the same for DEL which is 127 displayed as inverse ?, but the ? is meaningful and adheres to the same rule because it's literally ^? not a placeholder for "no glyph" or "non-printing control byte"

Doesn't help you with configuring an editor but just describing a way to display those undisplayable bytes in a way that is actually meaningful & unambiguous and without caring what the font or even terminal type is. (ei: works the same in BASIC on a TRS-80 Model 100 or in bash on a xterm, or in c on windows, etc).

reply

TazeTSchnitzel | karma 26116 | avg karma 2.81 · | 2015-10-23 16:28:51+00:00

In some languages which allow non-ASCII but aren't Unicode-aware (PHP, for instance), you can add significant, invisible zero-width spaces to identifiers.

iqanq | karma 228 | avg karma 0.67 · | 2022-02-14 04:32:54

ascii using cleartype seems to work fine for me, I don't see the issue.

supergreg | karma 525 | avg karma 2.64 · | 2016-07-06 12:26:31

Why not just use ASCII 31/30 instead of comma/new line. Just give it a new name like ASV (ASCII separated values). What would take to give glyphs to those characters? Just add them to some font?

vram22 | karma 2461 | avg karma 0.61 · | 2020-11-01 17:04:52+00:00

You are right, it is just ASCII text. Probably a brain fart there, sorry. I may have said that (hex editor) out of sheer habit of using one to inspect various formats. Or I may have used a text editor, if so TextPad, IIRC, since the project was on Windows (at least dev env was). It was years ago, so not sure.

ycmbntrthrwaway | karma 2002 | avg karma 4.34 · | 2017-01-27 16:58:20

They stick to plain ASCII and the same font everywhere to reduce metadata leakage.

jlg23 | karma 3464 | avg karma 2.92 · | 2016-05-20 23:44:15+00:00

Zero-width unicode chars have been used in exploit kits for a while now; just use hd (or something similar) when debugging.

userbinator | karma 78987 | avg karma 4.37 · | 2022-12-04 19:16:25

I use a DOS text editor for this, where no Unicode support is an advantage. The majority of the time I'm dealing with plain ASCII anyway.

teddyh | karma 23902 | avg karma 2.94 · | 2024-03-22 17:42:50

Even ASCII-only forums and mediums have come up with informal markups for features like *bold*, _underline_, SPEAKING LOUDLY, etc. You might be so accustomed to these that you dont’t even see the codes anymore, but they are codes, and quite ugly too.

dpassens | karma 114 | avg karma 2.24 · | 2024-04-28 16:35:09

But he's not modifying the code, he's just changing one ASCII char for another. And if you take a look at the hex editor screenshot, you'll see that it's not even inside the code, but rather surrounded by the menu entry names.