Hacker Read

kevin_thibedeau | karma 19088 | avg karma 2.16 · 2017-08-24 03:35:59

It's not really Nim's fault that cmd.exe has crap Unicode support.

e12e | karma 13838 | avg karma 1.49 · 2017-08-24 04:08:29+00:00

If you have an "echo" primitive/function, and "out of the box unicode support", then respecting the host OS codepage/output encoding by default is the sane thing to do (as indicated in the bug linked by the sibling comment ("just use the win32 api").

It's not that nim can't output wide characters from a utf8 source, it's just that it's not obvious how to do it in a standard way - one might thing that utf8 unicode "hello world" should "just work" on windows, and it doesn't.

It doesn't really make much sense to only do the right thing on systems that happen to have a utf8 locale (it's not that windows doesn't handle wide strings, it just doesn't have a utf8 locale by default).

It's not my impression that the nim community doesn't want to be cross-platform and beginner friendly - it's just that they're going through a phase of modernizing the win(32) sub-system.

Correct text handling isn't "just use utf8", as fun as that would be - correct handling is figuring out "what encoding is the source text", "what encoding is the destination file/device" and "how do I put the source into the destination".

I'd expect a latin1, a utf16 and a utf8 string to all be output correctly with "echo" - on all supported platforms. It's kind of why you would want to use a higher level language with "batteries included" in the first place.

reply

kevin_thibedeau | karma 19088 | avg karma 2.16 · 2017-08-24 04:32:47+00:00

Nim is not a Win32 runtime. It outputs strings to stdout. Any sane OS would do the right thing in 2017.

e12e | karma 13838 | avg karma 1.49 · 2017-08-24 06:49:17+00:00

I'm not sure how there can be "one true sane thing". There's more to strings than Unicode, and more to Unicode than utf8. In fact the os does pretty sane here - not checking for utf16 or utf32 isn't really great on Linux either (haven't checked how/if nim handles locales on Linux). Afaik, eg python3 does the right thing on Windows and Linux w/print(). It's just one of the many little things were one may whish for "one, simple, sane way" - but have to deal with the realities.

Same goes for things like handling paths, line endings for text files, resource forks (or not) for files, changing file meta-data...

reply

e12e | karma 13838 | avg karma 1.49 · 2017-08-24 15:51:23

I should perhaps clarify that "just use the win32 api" refers to on Windows. Just as one might use syscall 4/write() on Linux, if one doesn't want to/can use libc printf.

See also, for example:

https://stackoverflow.com/questions/15528359/printing-utf-8-...

https://stackoverflow.com/questions/26106647/c11-unicode-sup...

http://www.cprogramming.com/tutorial/unicode.html

Note; I'm not convinced using utf8 internally is a great idea - especially for a "beginner friendly" language. Playing with anything from palindromes to revise strings and character/grapheme counts and

  "d o u b l e  s p a c i n g"

strings can be fun learning exercises - that might be easier with a 32-bit (or even 64 bit) representation.

But no matter how you look at it, there's no such thing as "simple" handling of international text.

reply