Blog

Internationalization and localization

Today the world is a global village, the products are developed in one part of the world, undergo globalization process, launched in multiple markets and used in different part of the world.

12 December 2013

Localization testing

Difference in character-encoding (ASCII and MBCS)

The article by Svetlana Pravdina

CEO

Today the world is a global village, the products are developed in one part of the world, undergo globalization process, launched in multiple markets and used in different part of the world. As a consequence, the need of internationalization and localization process and testing requirement for the internationalized product is considerably increased.

Software Internationalization is the process of preparing a product so that it can properly operate when modified for use in different languages and locales. Internationalization refers to the process of designing, developing and engineering the product that can be adaptable to various locales and regions without further engineering changes.

Once an application has been internationalized, it is ready to be adapted for specific locales, or localized. Localization it is the process of customizing the software product for each language that is to be supported. It includes translating the program, choosing the appropriate icons and graphics and other cultural considerations. It also may include the translation of help files and documentation.

Releasing the products in all continents means that the product needs to comply with the local user culture. To be successful in capturing the global market, software product developers need to step out of their native locale and develop world-ready products. The elements are localized in most cases: GUI context, Help, Error messages, dialog boxes, Documents such as User manual, Installation guide, Release notes, Tutorials/Readme files.

Localization and internationalization testing includes the following types of testing:

Testing is always an integral part of such projects, and is specifically broken down to internalization and localization testing. These cover the following testing sub-types:

Compatibility testing: includes testing the product behaviour in identifying and initializing from its language environment and its ability to customize to that environment.
Functionality testing: includes running the whole functionality regression test on different language environments and exercising the interface with native language strings. It involves verifying the culture specific information such as date/time display.
User interface validation: includes check for visual problems such as text truncation or overlap, graphics issues or other visual problems.
Interoperability testing: ensures that the software interacts properly with targeted platforms, operating systems, applications (and versions) and so on.
Usability testing: evaluates the ease of use of the system.
Installation testing: ensures that the product installation messages are displayed in a corresponding language when installing the application on a dedicated server.

Difference in character-encoding (ASCII and MBCS)

One of the most challenging aspects of developing software for international use is testing that speakers of other languages are able to input information into the application in an intuitive and natural way. Testers should also assure that the program will be able to handle properly a wide range of characters and character combinations that people will use. Since languages don’t use the same written symbols, a variety of means have been established for programs to store and display these symbols, primarily using different character sets, which testers should focus on.

The ASCII character set consists of 256 ordered symbols with a numerical index, with each character capable of being represented by one byte. This is a sufficient number of characters for English and most of the European languages. However, Far Eastern languages require more symbols than can be contained in this one set of characters, with some, like Chinese and Japanese, using thousands of symbols. These languages employ sets of characters using one or more bytes to represent the symbols, which are multi-byte character sets (MBCS).

A lot of problems are encountered when working with MBCS, so careful attention should be paid to testing of how these characters are used. The biggest problems come from programs that assume that each character is a single byte. For instance, when the user presses the backspace key to clear a character, many programs that aren’t multi-byte enabled will delete only the second byte of the character, thus changing the symbol that is displayed rather than deleting the entire character. In addition, the selection of words, line breaking, and simple text editing will all suffer if the proper measures aren’t taken to enable an application to work with MBCS systems.

To test that a product correctly handles a complex array of characters, testers should include combinations of various character sets in their test data. Any place where text can be entered and displayed should be tested with upper ASCII characters (those in the upper half of the set that require the eighth bit to be set), double-byte characters in pure double-byte systems, and single- and double-byte characters in multi-byte systems. The areas that need this kind of focus include control labels, input fields, and filenames and paths. Including character set testing in every test plan will prevent many embarrassing and costly errors later on. Next time, I ‘tell you about the most typical localizable issues.