![]() |
opentag.com a place for localization tools and technologies |
\\ XML and Localization :: FAQ :: Localization | ||
You will find here the answers to some of the frequently asked questions about character representation in XML and related technologies. If you find any mistakes or have suggestions for additional useful information, please send an email.
How do I translate XML documents?There are various general translation tools that support XML. For example (by alphabetical order):
There are several globalization suites that also support XML. For example (by alphabetical order):
Note that not all tools and suites provide perfect handling of XML. Some have various problems that, depending on how your XML documents are set and which XML features they use, may make the localization process difficult. You can also use filters to prepare the XML documents into color-coded RTF files that can be easily translated with or without translation tools. Rainbow offers such filter. Once in RTF you can translate XML documents with other tools such as WordFast. In all cases, to correctly translate XML documents the localizers need to know the following:
Note that not all these information are explicit in a DTD or a schema. These information are need to create the "parameter file" (its name varies with each tool) that will allow the tool to process the documents correctly for translation. How to create this parameter file depends on each tool. What are the guidelines to create XML document types that will be easier to localize?The W3C ITS Working Group is working to produce such guidelines. Some of the main aspects of the guidelines are the following:
Read also the excellent paper on Localisation Considerations in DTD Design by Richard Ishida. What are the guidelines to author XML documents that will be easier to localize?The W3C ITS Working Group will produce such guidelines. Meanwhile you can start with the following:
What are the features a translation tool for XML should have?Some of the XML-specific features a translation tool that support XML documents should have are the following:
How can I specify that elements with complex rules are translatable?Currently very few translation tools allow you to specify complex rules
to define what elements are translatable (or not). For example, in the
following document only the content of the second <?xml version="1.0" ?> <repository> <item type="type">Type definition</item> <item type="text">Text to translate</item> </repository> One way to solve this issue is to pre-process the document to add a
temporary element the translation tools will be able to use directly. For
instance, a document such as the following, where the added element <?xml version="1.0" ?> <repository> <item type="type">Type definition</item> <item type="text"><tbt>Text to translate</tbt></item> </repository> The translation tools can use the added elements to recognize easily the content to localize. Such temporary elements are simply removed after the localization. The pre-processing can be done using a simple XSL template. For example, the following one transforms our first listing into the second: <?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="//item[@type='text']"> <xsl:copy> <xsl:apply-templates select="@*"/> <tbt><xsl:apply-templates/></tbt> </xsl:copy> </xsl:template> </xsl:stylesheet> More complex templates can be used, by simply adapting the XPath expression to match a specific set of nodes in the original document. See for instance, this other example. See also the Internationalization Tag Set (ITS). It offers rules to define translatable and non-translatable parts of an XML document. How CDATA sections fare during translation?Some translation tools will not support CDATA sections very well. But a good tool should handle it correctly. However, it might be difficult for the tool to retain the CDATA notation, especially if such section occurs within a content (rather than for the whole content). For example: <p>The codes <![CDATA[<tag1> and <tag2> ]]>indicate new items.</p> once translated, will most likely end up as: <p>Les codes <tag1> and <tag2> indiquent des nouveaux articles.</p> instead of: <p>Les codes <![CDATA[<tag1> and <tag2> ]]>indiquent des nouveaux articles.</p> This is perfectly correct from the XML syntax viewpoint. CDATA sections are just a way to escape a block of text. It is a good idea to stay away from CDATA from a localization viewpoint anyway: NCRs cannot be used in CDATA sections and this may leads to some problem if you need to convert the encoding of a document. In some cases however, you may want to put back the CDATA syntax in some
elements. This can be done easily with XSLT. Simply list the different
elements you want in CDATA in the <?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output encoding="utf-8" cdata-section-elements="prog text"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> Is there any translatable text in a CSS style-sheet?Finding translatable text in CSS is usually rare, but definitely
possible: As shown below, the value for the warning-para:before { content: "Warning! "; border: 2; color: red; } summary-body:after { content: "End of Summary"; } The choice of characters for the Any generated content need to be looked at carefully. This includes numbered list generation, markers, and so forth. See more details on CSS generated content in the CSS specifications. And obviously, any style defined in a CSS style-sheet may need to be modified for a given language. Is there a standard set of localization directives?Yes and no. There is a standard called the Internationalization Tag Set (ITS) that is a W3C Recommendation. While ITS is not exactly a standard for localization directives, some of its features can help you with this. ITS can be used as a namespace in any XML document, as in the following examples: <?xml version="1.0" encoding="UTF-8" ?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" xmlns:its="http://www.w3.org/2005/11/its"> <head><title>Title</title></head> <body> <h1 id="100">Introduction to <span its:term="yes">Document Management</span></h1> <p id="101">Our company, <span its:translate="no">Infinite Wisdom Inc.</span>, provides quality courses on how to manage your documentation.</p> </body> </html> Related information: the ITS Specification Can I see some examples of XML documents causing localization problems?Here are a few examples: Example 1The following document, while well-formed, is not easy to localize
because there is, most of the time, no easy way to indicate to the
translation tools that only elements starting with " <?xml version="1.0" ?> <error-messages xml:lang="en"> <msg001>Cannot open file {0}.</msg001> <note001>Tip: {0} is the name of a file.</note001> <msg002>Invalid parameter.</msg002> <msg999>Connection not available. Please try later.</msg999> </error-messages> Instead use an attribute to identify the changing ID: <?xml version="1.0" ?> <error-messages xml:lang="en"> <msg myId="1"> <text>Cannot open file {0}.</text> <note>Tip: {0} is the name of a file.</note> </msg> <msg myId="2"> <text>Invalid parameter.</text> </msg> <msg myId="999"> <text>Connection not available. Please try later.</text> </msg> </error-messages> Example 2The following multilingual document is difficult to localize because,
most if not all the current translation tools available cannot make a
distinction of the elements to translate based on what value <?xml version="1.0" ?> <messages xml:lang="en"> <prompt id="100"> <data xml:lang="en">Press Enter to start.</data> <data xml:lang="fr">Press Enter to start.</data> <data xml:lang="ja">Press Enter to start.</data> </prompt> <prompt id="101"> <data xml:lang="en">Press Cancel to stop.</data> <data xml:lang="fr">Press Cancel to stop.</data> <data xml:lang="ja">Press Cancel to stop.</data> </prompt> </messages> Instead, use one document per language, at least for the material you send to the localizer. If needed, after translation, you can group all entries in a single file, but treat that step as a "compilation-like" step to be done after localization. Example 3The following document has one small problem: the content of <?xml version="1.0" ?> <resources> <res> <srcid>Module123.description::top</srcid> <text>Moves to the top of the list.</text> </res> <res> <srcid>Module123.description::bottom</srcid> <text>Moves to the end of the list.</text> </res> </resources> Many tools would work better if the unique ID was stored in an
attribute of either <?xml version="1.0" ?> <resources> <res srcid="Module123.description::top"> <text>Moves to the top of the list.</text> </res> <res srcid=Module123.description::bottom"> <text>Moves to the end of the list.</text> </res> </resources> Example 4The following document offers a challenge when it comes to specify what
it translatable or not. The only way to distinguish the translatable item <?xml version="1.0" ?> <dialogue xml:lang="en-gb"> <rsrc id="123"> <component id="456" type="image"> <data type="text">images/cancel.gif</data> <data type="coordinates">12,20,50,14</data> </component> <component id="789" type="caption"> <data type="text">Cancel</data> <data type="coordinates">12,34,50,14</data> </component> </rsrc> </dialogue> The best way for tools to address this type of conditional translation would be to support XPath expressions. For instance, in out example the translatable nodes can be expressed as: //component[@type='caption']/data[@type='text'] If you want to avoid such conditional situations, an alternate way to
architecture this XML format would be to have distinct elements for the
data types of a <?xml version="1.0" ?> <dialogue xml:lang="en-gb"> <rsrc id="123"> <component id="456" type="image"> <url>images/cancel.gif</url> <coordinates>12,20,50,14</coordinates> </component> <component id="789" type="caption"> <text>Cancel</text> <coordinates>12,34,50,14</coordinates> </component> </rsrc> </dialogue> If the format cannot be changed, the solution to translate this file is
to have an XSL template that before translation adds extra elements
delimiting the content to translate, such as shown below with the <?xml version="1.0" ?> <dialogue xml:lang="en-gb"> <rsrc id="123"> <component type="image"> <data type="text">images/cancel.gif</data> <data type="coordinates">12,20,50,14</data> </component> <component id="123" type="caption"> <data type="text"><tbt>Cancel</tbt></data> <data type="coordinates">12,34,50,14</data> </component> </rsrc> </dialogue> The XSL template to get this <?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="//component[@type='caption']/data[@type='text']"> <xsl:copy> <xsl:apply-templates select="@*"/> <tbt><xsl:apply-templates/></tbt> </xsl:copy> </xsl:template> </xsl:stylesheet> Example 5The following document has several translatable items (marked in bold),
but there is no easy way to identify them, even using an XPath expression:
the <?xml version="1.0" ?> <resources> <section id="Homepage"> <arguments> <string>standard_page</string> <string>childlist</string> </arguments> <variables> <string>POLICY</string> <string>Corporate Policy</string> </variables> <keyvalue_pairs> <string>bar_Title</string> <string>ABC Corporation - Policy Repository</string> <string>bgColor</string> <string>NavajoWhite</string> <string>title</string> <string>List of Available Policies</string> </keyvalue_pairs> </section> </resources> A better way to store those data would be to have only one element for each key/value pair, and a distinction between translatable values and non-translatable ones, as shown below: <?xml version="1.0" ?> <resources> <section id="Homepage"> <arguments> <string key="standard_page">childlist</string> </arguments> <variables> <textstring key="POLICY">Corporate Policy</textstring> </variables> <keyvalue_pairs> <textstring key="bar_Title">ABC Corporation - Policy Repository</textstring> <string key="bgColor">NavajoWhite</string> <textstring key="title">List of Available Policies</textstring> </keyvalue_pairs> </section> </resources> Example 6In the following document the text to translate is to be processed with
some proprietary tool before being displayed and the XML document is only
a storage medium. The problem for localization here is that non-standard
markup exists in the document. The <?xml version="1.0" ?> <KBase> <Section id="a123"> <Title>ProSentinel prevented access to port [var Destination]</Title> <Para id="a123-1">The ProSentinel firewall stopped network traffic from reaching your computer. No security breach has occurred. [bold]Your data are safe[/bold].</Para> </Section> </KBase> It would be much better to use XML-type inline codes (if necessary using a different namespace than the container format), and to use an XSL template to transform the content to the proprietary codes. The original XML document would then be much easier to localize and look something like this: <?xml version="1.0" ?> <KBase> <Section id="a123"> <Title>ProSentinel prevented access to port <var name="Destination"/></Title> <Para id="a123-1">The ProSentinel firewall stopped network traffic from reaching your computer. No security breach has occurred. <emphasis>Your data are safe<emphasis>.</Para> </Section> </KBase>
|