Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I am not a fan of C's strings either. Null termination rather than communicating the length as a preamble has been a font of bugs for decades. On the other hand, ASCII is merely a way of interpreting a stream of bits that chunks the bits in into bytes and maps the bytes to characters. Unicode is another way of interpreting a string of bits which also chunks the bits into bytes, but then chunks bytes together before mapping to characters.

Both Unicode and ASCII are abstractions built on top of streams of bits largely (since both contain machine instructions such as <BEL>) intended to communicate text primarily and strings (in the computational sense) secondarily (for example as commands to a REST endpoint). For example C has had a wide character type for about 25 years [1] available as an abstraction built on top of strings...like most of C how wide is wide is implementation dependent and explicit 16 bit and 32 bit wide characters are more recently standardized.

[1]:



view as:

Legal | privacy