May 07, 2008

Unicode Now Most Common Text Format

Unicode is now the most common text encoding for pages found on the web. (Google made the report recently on their blog.) Let many note that this evidences that the writing is now on the wall, perhaps in a room that's already been remodeled without you're realizing it:

  • If you are a scholar whose writing includes reference to Greek or Hebrew or another language and do not know how to use unicode in your documents, it is not a fad that is going away.
  • If you are a publisher and are still considering when you should shift to unicode standards for your manuscript submissions, stop considering.
  • If you are a software developer and you have no unicode implementation or only partial/patchwork unicode implementation, the train is pulling away from the station.
  • If you haven't a clue what unicode is, you would benefit from this quick read.

4 comments:

Rod Decker said...

> If you are a software developer and you have no unicode implementation or only partial/patchwork unicode implementation...
- - -

To be more blunt, only Logos/Libronix is fully Unicode; both Accordance and BibleWorks are still use legacy fonts internally, though both can export Unicode. And both have font problems. BW is a pain to adjust font sizes and cannot adjust "on the fly" or "per pane." Accordance has serious font display issues and pdfs created from within Accordance will not print on many (any?) Windows computers/printers. Not all of these issues are Unicode-related, but it does suggest that the use of fonts needs some serious work on both platforms.

Danny Zacharias said...

It personally doesn't matter to me whether a program uses unicode throughout, like Logos, as long as it delivers what I want into unicode, which Accordance does. You can copy Hebrew and Greek from anywhere in Accordance and paste it as Unicode if you have your preferences set. What it looks like 'on the inside' doesn't really matter — just my opinion.

Rod,
I don't understand your PDF issue with Accordance. I think what you are saying is that you are choosing to Print, and then saving as PDF, correct? You're right, this would not put unicode in the PDF, it would stay as Helena. The windows user would need the equivalent font on their system.
But, try saving as PDF-X in the save as PDF drop-down menu. I think that will retain the font and look even if the other person doesn't have the font on their computer.
If you want the actual text to be unicode, you'll have to cut n' paste into textedit or Word, then make the PDF.
This brings up a good request that you should bring up to Accordance — they have a menu item called "Print Settings" — the user should be able to choose 'Print in unicode' for those making PDF's in the printer services.

Danny Zacharias said...

One other thing Joe,

I remember making this noise on deinde about publishers and unicode and ended up talking to both Jim Eisenbraun and Bob Buller. Both would love to go full unicode, the issue is the publishing software that they use (can't recall the name). Whatever it is, there is a more expensive, less feature-rich option that will support right to left unicode, but they don't want to make the switch.

As far as I can tell, Adobe InDesign fully supports unicode, not sure why they don't use that.

Rod Decker said...

> try saving as PDF-X in the save as PDF drop-down menu. I think that will retain the font and look even if the other person doesn't have the font on their computer.
- - -

Live and learn. (Or: open mouth, discover ignorance!)

I had no idea what pdf-x was. That's the solution; always save in pdf-x instead of plain pdf. With your pointer, I'ved one some poking around and discovered there is more to pdf thatn I'd suspected. I didn't know there were different formats. i've been so used to creating pdf with the full version of Acrobat (on Windows) and there one always specifies if the font is to be embedded or not. Didn't understand how Mac did that from the Print dialog.

I'm not sure where that leaves us with other save as pdf situations (doesn't affect Accordance pdf), but since "standard PDF features like forms, signatures, comments and embedded sounds and movies are not allowed in PDF/X" [http://en.wikipedia.org/wiki/PDF/X] (assuming that is accurate), I don't know how you could both include "active content" *and* embed fonts.