Hacker Read

unsigner · 2017-06-25 06:23:06+00:00

Don't ever use the original string as key in the localization table. That will force you to translate "high" difficulty the same as "high" resolution, for example.

fomine3 | karma 4883 | avg karma 0.96 · | 2020-09-30 01:59:02+00:00

It would makes localizing nightmare.

rraval | karma 421 | avg karma 5.07 · | 2019-01-08 15:59:20+00:00

> the solution for which is to use icu

We ran headfirst into this issue at my company and we've actually been recommending the opposite (use the "C" locale on the database, treat collation as a render level concern).

I have a whole write up explaining the technical motivations behind that recommendation: https://gist.github.com/rraval/ef4e4bdc63e68fe3e83c9f98f56af...

reply

valenterry | karma 2235 | avg karma 1.44 · | 2020-11-11 09:46:57+00:00

That's what I expect, too. In the end, the biggest problem is always to understand and change complex foreign code.

brudgers | karma 49350 | avg karma 2.35 · | 2015-04-26 15:03:08

Hardcoding 'Fizz' and 'Buzz' inhibits internationalization.

Izkata | karma 8171 | avg karma 1.58 · | 2022-10-17 23:07:18

There's also "translate to English", so you can instead treat it directly as a key and use that, or treat the English text as a key and change/add to the file if it's just something like a typo.

Don't recall about the second, we only had translations on one site and it's been a few years.

reply

bagasme | karma 22 | avg karma 0.79 · | 2023-10-02 09:06:40

The article doesn't mention how to resolve string manipulation problem involving locales.

wkat4242 | karma 10400 | avg karma 2.0 · | 2023-01-13 23:54:52

And then you have the issue of those small parts being so common that they could match even normal English. It's not an easy problem to solve :(

mpessas | karma 1 | avg karma 0.25 · | 2013-05-17 12:36:23+00:00

> The format does have some pretty major drawbacks too, like the msgid can become "fuzzy" which leads to a differing set of issues related to the unique keying between translations.

I am not sure how much of an issue this is in practice. The main problem of the PO format AFAICT is that it is quite outdated. For instance, it has no support for genders and you cannot "mix" plural rules within a phrase.

> It is interesting you call out cultural issues, did you have any specific examples?

The wikipedia entry on l10n[1] has some examples.

The process of localization is not merely about translating some strings, but adapting them to a specific language and culture, which is the hardest part. For instance, your home page is one of the most important pages in your app and is geared to make as many people as possible sign up. Do you think a simple translation would have the same effect on British, French, Arabs, Japanese etc people?

[1]: https://en.wikipedia.org/wiki/Internationalization_and_local...

reply

je42 | karma 771 | avg karma 1.24 · | 2017-11-05 12:11:15

don't forget that full internationalization also includes taking care of layouting issues i.e. words that are longer in other languages. Characters that are higher. Right to left text.

int_19h | karma 21203 | avg karma 1.69 · | 2017-03-03 03:14:27

"Bad" characters in this context is control characters. So no, it would not affect internationalization at all.

banana_giraffe | karma 4049 | avg karma 4.56 · | 2021-09-09 11:03:29

Reminds me of the Pseudo-Locales Windows Vista added that "translate" English strings to things that look like English, but use unusual characters and end up with longer strings in an attempt to catch UI issues before having full localization versions ready.

https://docs.microsoft.com/en-us/windows/win32/intl/pseudo-l...

reply

supermatt | karma 4996 | avg karma 3.41 · | 2023-06-08 01:43:29

I dont think you understand how localization works. They have a localization file and they send them off to a translation service. The translation service goes through the file and translates the individual strings (or string fragments).

Either the translators made a mistake, and thought it was referring to a ZIP (regardless of capitalization) and translated accordingly, or a developer used the wrong key when assembling the string references - i.e. he used the equivalent of (this is pseudocode as I dont know how they handle localizations):

  localize("CompressToArchive", localize("Zip"), localize("File")) - i.e. with a reference to localization of "Zip" (or ZIP, or zip - the dev likely just searched for a string that matched what he wanted to localize)

instead of

  localize("CompressToArchive", "Zip", localize("File")) - i.e. with a string of "Zip"

where the strings are defined as:

  CompressToArchive: Compress to %1.%2 (same for us and uk)
  Zip: ZIP (us) or postcode (uk)
  File: File (same for us and uk)

gus_massa | karma 17133 | avg karma 1.44 · | 2022-09-25 08:13:46

Nice.

I tried a small change

  "key": "Where is your key?"

==> de

  "key": "Wo ist dein Schlüssel?"

that shows that it correctly translate "key" in the string bit not "key" in the field name.

==> es

  "key": "¿Dónde está tu llave?"

that shows that it adds a "¿" at the beginning that is correct and obvious, but I'm too used to bad translators (perhaps I'm too old).

Which languages are supported?

reply

Freak_NL | karma 12342 | avg karma 4.04 · | 2017-07-04 11:05:51+00:00

> Probably not something international software is aware of.

Collation rules that vary by locale exist for this reason, and all major programming languages and OS'es support this. Of course whether the software you use does this or not depends on the developers writing the software.

reply

idatum | karma 455 | avg karma 2.65 · | 2021-11-11 13:51:10

Somewhat related to injecting unusual characters, in my experience in localization efforts:

Inject a Turkish 'I'. I don't know how to type or paste it here, but picture an English lower case 'i' that is upper case. It is a splendid way among many to shake out some loc bugs.

reply

simonh | karma 32703 | avg karma 2.95 · | 2013-03-15 09:37:30+00:00

Spot on. It never occurred to me this was essentially a localisation issue. Blasted annoying one though.

xyzelement | karma 7483 | avg karma 3.9 · | 2023-12-16 17:45:17

Can’t localize that!

numpad0 | karma 6056 | avg karma 1.35 · | 2020-12-29 17:50:13+00:00

“Shorter/simpler/obvious the sentence is the easier it must be” isn’t actually the case with translations.

UI strings being short usually means hidden heavy context lies in visual elements, so it’ll just strengthen hilarity in mistakes like “Name: SQL Server, Province/Prefecture: Running” (because you know, equivalents to provinces in a region are called “State” in American English...).

“Province” is more or less harmless, but “(has/is/is in/to/like to)Start(ed/ing) type of errors due to missing context can make UI unusable. Oh and it’s un-spottable by non-speakers because they make sense when translated back to original languages.

reply

zozbot234 | karma 19616 | avg karma 2.16 · | 2019-12-21 14:41:36+00:00

In practice, even the hardware arrangements of keys on stenotype machines (disregarding the actual assignment of mnemonic labels to keys) are quite region-specific. A big obstacle to i18n in this field.