|
Frequently Asked Questions |
|
This page answers common questions about internationalization of the Java 2 platform, Standard Edition, version 1.4.2, and of Sun's Java 2 Runtime Environments, Standard Edition, version 1.4.2. For more information, see the Internationalization home page.
Internationalization allows software to be adapted to any language and cultural convention. During the internationalization process, the programmer isolates the parts of a program that are dependent on language and culture. For example, the programmer will isolate error messages because they must be translated during localization.
Localization is the process of adapting a program for use in a specific locale. A locale is a geographic or political region that shares the same language and customs. Localization includes the translation of text such as GUI labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.
See the steps outlined in the Checklist section of the The Java Tutorial.
A locale is a geographic or political region that shares the same
language and customs. In the Java programming language, a locale is
represented by a Locale
object. Locale-sensitive
operations, such as collation and date formatting, vary according to
locale.
Locale
objects?See the Setting the Locale section of the The Java Tutorial.
The supported locales vary between different implementations of the Java 2 platform and between areas of functionality. Information about the supported locales in Sun's Java 2 Runtime Environments is provided by the Supported Locales document.
Yes. This capability allows you to create multilingual applications.
This depends on the implementation of the Java 2 platform you're
using. The initial default locale is normally determined from the
host operating system's locale. Version 1.4.2 of Sun's Java 2 Runtime
Environments lets you override this by setting the user.language,
user.country, and user.variant system properties from the command
line. For example, to select Locale("th", "TH", "TH")
as
the initial default locale, you would use:
java -Duser.language=th -Duser.country=TH -Duser.variant=TH MainClass
Since not all runtime environments provide this feature, it should only be used for testing.
A ResourceBundle
object allows you to isolate
localizable elements from the rest of the application. With all
resources separated into a bundle, the application simply loads the
appropriate bundle for the active locale. If the user switches
locales, the application just loads a different bundle.
ResourceBundle
objects?See the Isolating Locale-Specific Data section of the The Java Tutorial.
You can specify any Unicode character with the \uXXXX notation. (The XXXX denotes the 4 hexadecimal digits that comprise the Unicode value of a character.) For example, a properties file might have the following entries:
s1=hello there s2=\uff2d\uff33\u30b4
If you have edited and saved the file in a non-ASCII encoding, you can convert it to ASCII with the native2ascii tool. For example, you might want to do this when editing a properties file in Shift-JIS, a popular Japanese encoding.
ListResourceBundle
?
If your source file is in a non-ASCII encoding, you can direct the compiler to convert it into Unicode. For example, you would compile a Japanese resource bundle written in the Shift-JIS encoding as follows:
javac -encoding SJIS LabelsResource_ja.java
You can use the SimpleDateFormat
to format and parse
dates in a locale-sensitive manner. See the section on formatting
Dates
and Times in the
The Java
Tutorial.
Instances of java.text.Format
and its subclasses are
generally not synchronized. It is recommended to create separate
format instances for each thread. If multiple threads access a format
concurrently, it must be synchronized externally.
The Collator
class, and its subclasses, are used for
building sorting routines. These classes are locale-sensitive, and
when created with the no-argument constructor will use the collating
sequence of the default locale.
Since decomposing takes time, turning decomposition off makes
comparisons go faster. However, for Latin languages the
NO_DECOMPOSITION
mode is not useful if the text contains
accents. You should use the default decomposition unless you really
know what you're doing.
The strength property you choose depends on what your application
is trying to accomplish. For example, when performing a text search
you may allow a "weak" match, in which accents and differences in
case (upper vs. lower) are ignored. This type of search employs the
PRIMARY
strength. If you are sorting a list of words,
you might want to use the TERTIARY
strength. In this
mode the properties that must match are the base character, accent,
and case.
A character encoding is a mapping between characters and code values.
In the Java programming language, char
values
represent Unicode characters. Unicode is a 16-bit character encoding
that supports the world's major languages. You can learn more about
the Unicode standard at the Unicode
Consortium web site.
The Converting Non-Unicode Text section of the The Java Tutorial explains how to perform the conversions within an application using high-level APIs, or see the java.nio.charset.Charset class if you need more direct access to character conversion. To convert data files, use the native2ascii tool.
See the Supported Encodings web page.
The java.nio.charset.spi.CharsetProvider class lets developers create their own character converters.
The default encoding is selected by the Java runtime based on the host operating system and its locale. For example, in the US locale on Windows, Cp1252 is used. In the Simplified Chinese locale on Solaris, either EUC_CN or GBK can be the default encoding, depending on the selection made when logging into Solaris.
The default encoding is significant because the Java programming language uses Unicode to represent characters, but the file system of the host operating system usually uses some other encoding. The default encoding has to match the encoding used by the host operating system to ensure correct interaction.
There are many character encodings that don't support all European characters (such as "ß" or "é"), but we get this question particularly often from users of the Solaris C locale. On Solaris and Linux, the Java 2 Runtime Environment version 1.2 and higher determines the default encoding by calling the nl_langinfo function. On Solaris 7 and higher, this function returns "646" when run in the C locale, indicating ISO 646 or ASCII as the default encoding. ASCII only includes half the characters of ISO 8859-1, so many commonly used European characters are missing.
An easy workaround is to use the Solaris en_US locale, which uses ISO 8859-1 as its character encoding. You can set the Solaris locale from the login screen or by setting the the LC_ALL environment variable. Another solution is to explicitly specify the desired character encoding in your calls to String, java.io, and java.nio API that performs encoding conversion.
UTF-8 stands for Unicode (or UCS) Transformation Format, 8-bit encoding form. It is a transmission format for Unicode that is suitable for use with many network protocols and UNIX file systems.
No. Cp1252 contains some additional characters in the range from 0x80 to 0x9F. See the Microsoft documentation for more information.
The input method framework enables all text editing components to receive Japanese, Chinese, or Korean text input through input methods. An input method lets users enter thousands of different characters using keyboards with far fewer keys. Typically a sequence of several characters needs to be typed and then converted to create one or more characters. For specifications and examples see the web page, Input Method Framework.
A user may have multiple input methods available. For example, the user may have input methods for different languages or input methods that accept various types of input. Such a user must be able to select the input method used for a particular language or the input method that provides the fastest input.
An application can request an input method that supports a specific locale using the InputContext.selectInputMethod method, but it cannot select a specific input method - that selection is up to the user.
An application can activate an input method using the InputContext.setCompositionEnabled method.
See the Input Methods section of the Java 2 SDK Internationalization Overview.
An application using lightweight components can select fonts in four different ways:
Font.createFont
method.
An application using peered AWT components can only use logical font names.
Here's a brief summary:
The answer depends on how your application selects fonts - see above.
The font.properties files are used in Sun's Java 2 Runtime Environments to map logical font names to physical fonts. There are several files to support different mappings depending on host operating system version and locale. The files are located in the lib directory within the J2RE installation.
Note that font.properties files are implementation dependent. Not all implementations of the Java 2 platform use them, and the format and content vary between different runtime environments as well as between releases.
Since the mapping from logical fonts to physical fonts is implementation dependent, the answer varies. For Sun's Java 2 Runtime Environments, you need to create or modify a font.properties file - see the web page The font.properties Files. Note however that this is a modification of the J2RE, and Sun does not support modified J2REs. For other implementations, see their respective documentation.
Swing user interface components use a different mechanism to render text than peered AWT components. The Swing components use the Graphics.drawString method, typically specifying a logical font name. The logical font name is then mapped to a set of physical fonts to cover a large range of characters. AWT components on the other hand are implemented using host operating system components. These host operating system components often do not support Unicode, so the text gets converted to some other character encoding, depending on the host operating system and locale. These encodings often cover a smaller range of characters than the physical fonts used to implement logical font names. For example, on a Japanese Windows 98 system, many European accented characters are mapped to the Arial font for Swing components, but get lost when converting the text to the Shift-JIS encoding for peered AWT components.
As in the Chinese/Japanese/Korean case above, this may be because text is not rendered using the Unicode font at all or only for some characters. If your application selects the Unicode font using its physical font name, and it still cannot render all characters, it could be that the Unicode font doesn't in fact cover the entire Unicode character set - sometimes a font is called a Unicode font if it just provides the tables that support the Unicode character encoding.
See the Supported Fonts document.
The short answer is yes. The long answer needs to look at which languages you want to display at the same time, and how your application selects fonts.
Among the South and South-East Asian scripts, version 1.4.2 of Sun's Java 2 Runtime Environments supports Thai and Devanagari. For a complete list of all supported writing systems, see the Supported Locales document. Support for other writing systems may be added in future releases.
See the Supported Locales document.
Yes, Sun's Java 2 Runtime Environments let you type the Euro character, render it, convert it from and to numerous character encodings, and use it when formatting numeric values as currency. For text input and rendering, you need the appropriate support in the host operating system - see the documentation for Windows and Solaris. For formatting with a currency symbol, Sun's Java 2 Runtime Environments v. 1.4.2 uses the Euro as the default currency for the member countries of the European Monetary Union.
Copyright © 2003 Sun Microsystems, Inc. All Rights Reserved. Please send comments to: java-intl@java.sun.com |
|