Ticket #182 (new task)

Opened 2 months ago

Last modified 1 month ago

UTF-8

Reported by: ken Assigned to: clfs-commits@lists.cross-lfs.org.
Priority: minor Milestone: CLFS Standard 1.2.0
Component: BOOK Version: CLFS Standard SVN
Keywords: Cc:

Description

Couldn't find a ticket for this, so starting a new one as an aide-memoire.

If people want to use UTF-8 (and so far, there seems a lack of consensus), the assumption is that it should be optional. So far, I've been using it for a couple of years or so, and I'm aware of at least the following additions (there are probably others):

1. for glibc add libidn. Now that glibc no longer gets releases, I'm going to try this with upstream libidn (v1.9), but I haven't yet.

2. for ncurses --enable-widec so that we build the ...w versions and remove/replace the non-wide versions similar to in LFS (ISTR the detail is slightly different for how to do this on multilib).

3. perhaps a note that if procps fails to compile in a UTF-8 system, check what you did to ncurses.

4. for groff, optionally sed characters U+2010,2018,2019,2212 to ascii characters more likely to be found in common screen fonts, as in LFS.

5. for man, convert the message files from various legacy encodings to UTF-8, and similarly the supplied non-English man pages (apropos, makewhatis, etc). I don't know if any other core packages need this, the problem for each package is to find a message that has been translated, and work out how to generate that error so it can be tested to ee if the translation appears or if a legacy encoding appears.

6. follow man by groff-utf8 and sed man.conf to use it.

7. alter vim to put UTF-8 pages (fr, it, pl, ru) into the language directory instead of fr.UTF-8 etc. My notes say that russian otherwise goes into ru.KOI8-R but I don't apparently do any recoding, so that needs to be checked again - certainly, with vim-7.1 I've got UTF-8 pages installed.

8. At the moment, I don't think there are any UTF-8 pages shipped in any of the core packages. Shadow used to have loads, but those seem to have been dropped when debian rescued it. Perhaps we should have something a bit like what is in LFS explaining how to recode pages, but with the presumption that anyone doing this wil be recoding to UTF-8. Maybe also a note that support for non-alphabetic in groff-utf8 is not perfect - sometimes there are error messages about fitting the text to the line, e.g. <standard input>:51: warning [p 1, 2.3i]: cannot adjust line - this applies particularly for japanese, but maybe also for chinese or korean (I can only trigger it for japanese). Doing the recoding of the man files apparently means that 'man' cannot use legacy encodings (e.g. latin2, koi8r) - even latin1 might have oddities.

Note that man pages in UTF-8 alphabetic languages work in the console, provided you have a suitable font. For chinese, japanese, korean you need a graphical display - rxvt-unicode works, I assume gnome-terminal does too.

We would also need some explanation of why to use this (easy - supports multiple languages on screen at the same time, rather than just a number of neighbouring languages, and handles "fancy quotes" sometimes found in english pages, e.g. from smartmontools), and alternatively why to not use it (perhaps, for people who have a large amount of text in legacy encodings, or who need to use legacy encodings).

Discussion about the "should we do this" part on -dev, please.

Change History

09/30/08 14:39:56 changed by ken

Re item 5, man : apparently, current versions of man-db expect man pages to be in UTF-8, and will convert them if they detect a legacy encoding. Also, man is now pretty much out on its own in using the obsolete catgets, so most packages will not need the message files to be recoded. Apparently, it could be worth looking at current ubuntu and debian - apparently the non-English man pages there are converted to UTF-8.

09/30/08 22:09:38 changed by jciccone

Re Item 5, I'd much rather see man & a utf8 groff then man-db.

(in reply to: ↑ description ; follow-up: ↓ 4 ) 10/01/08 14:06:39 changed by ken

Replying to ken:

1. for glibc add libidn. Now that glibc no longer gets releases, I'm going to try this with upstream libidn (v1.9), but I haven't yet.

in fact, this isn't a formally-released version of glibc, so the libidn part is already there, no need for an add-on.

(in reply to: ↑ 3 ) 10/01/08 14:19:02 changed by jciccone

  • version set to CLFS Standard SVN.
  • milestone set to CLFS Standard 1.2.0.

Replying to ken:

Replying to ken:

1. for glibc add libidn. Now that glibc no longer gets releases, I'm going to try this with upstream libidn (v1.9), but I haven't yet.

in fact, this isn't a formally-released version of glibc, so the libidn part is already there, no need for an add-on.

When I created the 2.8 tarball I checked out the glibc_2_8 tag from cvs and created a tarball (after touching a few files to ensure proper timestamps). This may be a good thing that I left libidn in then.

11/04/08 13:44:04 changed by jciccone

For #2 ncurses. I was looking at the different ways for doing this. From what I can tell, the best way to cause the least amount of compatibility issues is to compile two sets of ncurses libraries. One without widec and one with. I *might* beable to tackle this in the near future. Time will tell.