opentag.com
\\ Technologies :: Formats :: XLIFF

The XML Localisation Interchange File Format (XLIFF, pronounced like "ex-leaf") is a format developed by a group of localization customers, localization suppliers, and tools vendors, including: Oracle, Novell, IBM/Lotus, Sun MicroSystems, Alchemy  Software, Berlitz, Moravia-IT, and ENLASO Corporation (formerly the RWS Group). XLIFF final draft of version 1.0 was ready at the end of May 2001. XLIFF was moved under the aegis of OASIS in December 2001, the version 1.1 of XLIFF became an official OASIS Committee Specification in the Spring of 2003. The version 1.2 became an official OASIS Standard in February 2008. XLIFF is maintained by the OASIS XLIFF Technical Committee, where anyone can participate.

A few frequently asked questions:

Purpose of XLIFF

XLIFF is a format to store extracted text and carry the data from one step to another in the localization process.

It follows the same principles as OpenTag, and borrows also a few ideas from TMX. For example, to specify in-line codes XLIFF supports both the placeholder OpenTag's method and TMX's encapsulation system.

Different Components of an XLIFF Document

For detailed information see the XLIFF specification.

An XLIFF document is composed of one or more <file> elements. Each <file> element corresponds to an original file or source (i.e. database table) identified by the original attribute. A <file> contains the source of the localizable data and, once translated, the corresponding localized data for one, and only one, locale.

Localizable data are stored in <trans-unit> elements. The <trans-unit> element holds a <source> element to store the source text, and a <target> element to store the latest translated text. The <target> elements are not mandatory, and it is also perfectly correct, if desired, duplicate the original source text in the <target> element at the beginning of the process (with the xml:lang attribute set to a target language. For example:

<trans-unit id="1">
 <source xml:lang="en">Cannot find the file.</source>
 <target xml:lang="ja">Cannot find the file.</target>
</trans-unit>

Any translations of the <source> text that is not the latest (i.e. before proof, before edit, etc.), as well as possible proposed translations provided at some point during the localization process, can be stored in an unlimited number of <alt-trans> elements. The different versions of the translations can be linked to a specific phase of the process by the phase-name attribute. The details on the different phases can be stored in the <phase> elements, grouped in the <phase-group> element of the <header> (one per <file> element). If needed the <alt-trans> element can have also a corresponding <source> (i.e. for fuzzy matches).

For example:

<phase-group>
 <phase phase-name="p0" process-name="Preprocessing"/>
 <phase phase-name="p1" process-name="Translation"/>
 <phase phase-name="p2" process-name="Edit" contact-name"Pierre">
  <note>The quality was good overall, but it seems that the TM
used to preprocess the file had many typos. Some got carried
into the translation.</note>
 </phase>
</phase-group>
...
<trans-unit>
 <source xml:lang="en">Cannot find the file.</source>
 <target xml:lang="fr-FR">Fichier non trouvé.</target>
 <alt-trans>
  <target xml:lang="fr-FR" phase-name="p1"
  >Fichier pas trouvé.</target>
 <alt-trans>
 <alt-trans quality-match="76">
  <source xml:lang="en">Folder not found.</source>
  <target xml:lang="fr-FR" phase-name="p0"
  >Dossier pas trouvé.</target>
 <alt-trans>
</trans-unit>

An XLIFF <file> element can also contain binary data. this is done through the use of the <bin-unit> elements. A <bin-unit> element contains one <bin-source> element which holds the source data, either directly in the document (using an <internal-file> element), or as a reference (using an <external-file> element). An optional <bin-target> contains the corresponding data for the target locale.

Each <bin-unit> can also have a list of <trans-unit> elements corresponding to the text to localize. For example a <bin-unit> could be a bitmap, and the <trans-unit> elements the different sentences to translate in that bitmap. XLIFF does not provide any rules or mechanism on how text is extracted or merged.

As you see XLIFF provides a rich semantic and can be used in many different ways, but still provide a consistent structure that is conducive to interoperability between tools.

Examples of XLIFF Documents

Example 1: Text extracted from a Photoshop file (PSD file) and its translation in Japanese:

<?xml version="1.0"?>
<xliff version="1.1">
 <file original="Graphic Example.psd"
  source-language="EN-US" target-language="JA-JP"
  tool="Rainbow" datatype="photoshop">
  <header>
   <skl>
    <external-file uid="3BB236513BB24732" href="Graphic Example.psd.skl"/>
   </skl>
   <phase-group>
    <phase phase-name="extract" process-name="extraction"
     tool="Rainbow" date="20010926T152258Z"
     company-name="NeverLand Inc." job-id="123"
     contact-name="Peter Pan" contact-email="ppan@xyzcorp.com">
     <note>Make sure to use the glossary I sent you yesterday.
      Thanks.</note>
    </phase>
   </phase-group>
  </header>
  <body>
   <trans-unit id="1" maxbytes="14">
    <source xml:lang="EN-US">Quetzal</source>
    <target xml:lang="JA-JP">Quetzal</target>
   </trans-unit>
   <trans-unit id="3" maxbytes="114">
    <source xml:lang="EN-US">An application to manipulate and 
     process XLIFF documents</source>
    <target xml:lang="JA-JP">XLIFF 文書を編集、または処理
     するアプリケーションです。</target>
   </trans-unit>
   <trans-unit id="4" maxbytes="36">
    <source xml:lang="EN-US">XLIFF Data Manager</source>
    <target xml:lang="JA-JP">XLIFF データ・マネージャ</target>
   </trans-unit>
  </body>
 </file>
</xliff>

Example 2: A simple XLIFF file with strings extracted from a Windows RC file. Here the skeleton (the data needed to reconstruct the original file are) is stored in a separate file:

<?xml version="1.0" encoding="windows-1252" ?>
<xliff version="1.1" xml:lang='en'>
 <file source-language='en' target-language='fr' datatype="winres"
  original="Sample1.rc">
  <header>
   <skl><external-file href="Sample1.rc.skl"/></skl>
  </header>
  <body>
   <group restype="dialog" resname="IDD_DIALOG1">
    <trans-unit id="1" restype="caption">
     <source>Title</source>
    </trans-unit>
    <trans-unit id="2" restype="label" resname="IDC_STATIC">
     <source>&amp;Path:</source>
    </trans-unit>
    <trans-unit id="3" restype="check" resname="IDC_CHECK1">
     <source>&amp;Validate</source>
    </trans-unit>
    <trans-unit id="4" restype="button" resname="IDOK">
     <source>OK</source>
    </trans-unit>
    <trans-unit id="5" restype="button" resname="IDCANCEL">
     <source>Cancel</source>
    </trans-unit>
   </group>
  </body>
 </file>
</xliff>

Example 3: The same RC data as in example 2, this time with the skeleton embedded inside the XLIFF document, as a CDATA section (text in red):

<?xml version="1.0" encoding="windows-1252" ?>
<xliff version="1.1" xml:lang='en'>
 <file source-language='en' target-language='fr' datatype="winres"
  original="Sample1.rc">
  <header>
   <skl>
    <internal-file crc="64a2b9b0"><![CDATA[
<OKFSKL100:RES:964008261>
#include "resource.h"
IDD_DIALOG1 DIALOG DISCARDABLE  0, 0, 186, 57
STYLE DS_MODALFRAME | WS_POPUP | WS_CAPTION | WS_SYSMENU
CAPTION "<xref$1>"
FONT 8, "MS Sans Serif"
BEGIN
    LTEXT           "<xref$2>",IDC_STATIC,8,4,18,8
    EDITTEXT        IDC_EDIT1,8,16,100,14,ES_AUTOHSCROLL
    CONTROL         "<xref$3>",IDC_CHECK1,"Button",
                    BS_AUTOCHECKBOX | WS_GROUP | 
                    WS_TABSTOP,8,40,41,10
    DEFPUSHBUTTON   "<xref$4>",IDOK,129,7,50,14,WS_GROUP
    PUSHBUTTON      "<xref$5>",IDCANCEL,129,24,50,14
END]]></internal-file>
   </skl>
  </header>
  <body>
   <group restype="dialog" resname="IDD_DIALOG1">
    <trans-unit id="1" restype="caption">
     <source>Title</source>
    </trans-unit>
    <trans-unit id="2" restype="label" resname="IDC_STATIC">
     <source>&amp;Path:</source>
    </trans-unit>
    <trans-unit id="3" restype="check" resname="IDC_CHECK1">
     <source>&amp;Validate</source>
    </trans-unit>
    <trans-unit id="4" restype="button" resname="IDOK">
     <source>OK</source>
    </trans-unit>
    <trans-unit id="5" restype="button" resname="IDCANCEL">
     <source>Cancel</source>
    </trans-unit>
   </group>
  </body>
 </file>
</xliff>

Example 4: The same RC data as in example 2, this time with the skeleton embedded inside the XLIFF document and coded in base64 format (text in red):

<?xml version="1.0" encoding="windows-1252" ?>
<xliff version="1.1" xml:lang='en'>
 <file source-language='en' target-language='fr' datatype="winres"
  original="Sample1.rc">
  <header>
   <skl>
    <internal-file crc="d341e458" form="base64>
PE9LRlNLTDEwMDpSRVM6OTY0MDA4MjYxPg0KI2luY2x1ZGUgInJlc291cmNlLmgiDQpJRERf
RElBTE9HMSBESUFMT0cgRElTQ0FSREFCTEUgIDAsIDAsIDE4NiwgNTcNClNUWUxFIERTX01P
REFMRlJBTUUgfCBXU19QT1BVUCB8IFdTX0NBUFRJT04gfCBXU19TWVNNRU5VDQpDQVBUSU9O
ICI8eHJlZiQxPiINCkZPTlQgOCwgIk1TIFNhbnMgU2VyaWYiDQpCRUdJTg0KICAgIExURVhU
ICAgICAgICAgICAiPHhyZWYkMj4iLElEQ19TVEFUSUMsOCw0LDE4LDgNCiAgICBFRElUVEVY
VCAgICAgICAgSURDX0VESVQxLDgsMTYsMTAwLDE0LEVTX0FVVE9IU0NST0xMDQogICAgQ09O
VFJPTCAgICAgICAgICI8eHJlZiQzPiIsSURDX0NIRUNLMSwiQnV0dG9uIiwNCiAgICAgICAg
ICAgICAgICAgICAgQlNfQVVUT0NIRUNLQk9YIHwgV1NfR1JPVVAgfCANCiAgICAgICAgICAg
ICAgICAgICAgV1NfVEFCU1RPUCw4LDQwLDQxLDEwDQogICAgREVGUFVTSEJVVFRPTiAgICI8
eHJlZiQ0PiIsSURPSywxMjksNyw1MCwxNCxXU19HUk9VUA0KICAgIFBVU0hCVVRUT04gICAg
ICAiPHhyZWYkNT4iLElEQ0FOQ0VMLDEyOSwyNCw1MCwxNA0KRU5EDQo=
    </internal-file>
   </skl>
  </header>
  <body>
   <group restype="dialog" resname="IDD_DIALOG1">
    <trans-unit id="1" restype="caption">
     <source>Title</source>
    </trans-unit>
    <trans-unit id="2" restype="label" resname="IDC_STATIC">
     <source>&amp;Path:</source>
    </trans-unit>
    <trans-unit id="3" restype="check" resname="IDC_CHECK1">
     <source>&amp;Validate</source>
    </trans-unit>
    <trans-unit id="4" restype="button" resname="IDOK">
     <source>OK</source>
    </trans-unit>
    <trans-unit id="5" restype="button" resname="IDCANCEL">
     <source>Cancel</source>
    </trans-unit>
   </group>
  </body>
 </file>
</xliff>

More Information

The XLIFF Pages at OASIS
http://www.oasis-open.org/committees/xliff (or http://www.xliff.org)

Latest XLIFF Specification
http://www.oasis-open.org/committees/xliff/documents/xliff-specification.htm

"Indroduction to XLIFF" PowerPoint Presentation and samples files
http://lists.oasis-open.org/archives/xliff/200210/zip00005.zip

The DTD for XLIFF 1.1
http://www.oasis-open.org/committees/xliff/documents/xliff-core-1.1.xsd

Public archives of the XLIFF TC mailing list (posting reserved to the TC members)
http://lists.oasis-open.org/archives/xliff/

Public archives of the XLIFF-Comments mailing list (posting opened to anyone)
http://lists.oasis-open.org/archives/xliff-comment/

XLIFF page on the XML Cover's Pages
http://xml.coverpages.org/xliff.html

Article on XLIFF by Milan from Moravia-IT
http://www.moravia-it.com/en/news/xliff.asp

Tools and Resources for XLIFF

Settings files to use XLIFF documents with some of the XML-enabled commercial translation tools.

ENLASO Localization Tools, provides XLIFF filters for various formats.

XLIFF Translation Editor by Heartsome, to edit XLIFF documents.

What are the differences between XLIFF and OpenTag?

Both formats have the same general goal: storing extracted text for localization purpose. However, there are many differences:

  • OpenTag was developed early, when XML was still in its infancy, XLIFF was started four years after the last published version of OpenTag. OpenTag is not maintained anymore.
  • OpenTag is less structured format than XLIFF, this means it is more adaptable but also less interoperable.
  • OpenTag allows any number of languages in the same document. XLIFF is designed to work with one source and one target language.
  • OpenTag offers a few documentation-specific elements that have no equivalents in XLIFF which was designed primarily for UI and tagged formats. However, documentation-type data can be also stored in XLIFF.
  • XLIFF offers a rich set of mechanisms for pre-translation, revisions, change tracking, etc. which is not in OpenTag.
  • In XLIFF you can embed the Skeleton file. This cannot be done in OpenTag, except with tool-specific instructions.
  • XLIFF offers support for binary object (icon, bitmap, etc.) to be stored along with their text, not OpenTag.
  • In addition to the "placeholder" method for inline codes, XLIFF provides an "encapsulation" method (similar to TMX's) that OpenTag does not have.

Overall you can see XLIFF as the "next version" of OpenTag. Some tools still use OpenTag and that is perfectly fine. Both formats can usually be easily mapped to each other. If you start the development of a tool and plan on using one of the two formats, use XLIFF.

What are the differences between XLIFF and TMX?

TMX is a format to exchange translation memory data from one tool to another. Its purpose is therefore different from XLIFF's. Both formats have a lot in common, especially regarding the inline markup elements.

  • TMX uses only the encapsulation methods for inline codes (there native codes are enclosed within different elements), while XLIFF provides both the encapsulation method (using elements very similar to TMX's) and the placeholder method (where the native codes are removed to the Skeleton file and replaced by a short element that refers to them, using elements very similar to OpenTag's).
  • TMX allows any number of languages in the same document. XLIFF is designed to work with one source and one target language.

Is XLIFF meant to be a container for TMX, TBX, OpenTag, etc. files?

No. An XLIFF document is meant to contain data extracted from an original source (file, database, etc.). It may have references to TMX or TBX or other files. There is even a mechanism (the <internal-file> element) to embed such references in the XLIFF document itself. But such mechanism is there only for convenience, the main data of an XLIFF content are in XLIFF format.

Can XLIFF be used as a resource format?

Probably. If the data you need are simple (e.g. identifier/string pairs), XLIFF can probably be used as source format for your application.

However, keep in mind that XLIFF has never been designed for such purpose. Its intent is not to be a common resource format, but a storage format for extracted during the localization process.

Why XLIFF is only bilingual and not multilingual?

XLIFF is bilingual (each translation unit offers a <source> and a <target> elements) because it makes the overall model much more simpler when you consider all the additional features the format provides, such as pre-translation, version tracking, etc.

There are not significant advantages (and actually many drawbacks) in having a file containing all languages together during localization. At some point you will have to split such file into source and one target since it has to be translated by different translators.

There are however ways to have multilingual XLIFF documents, if you really want to:

  • Each <file> element can have different source/target pairs.
  • The language of the data in the <alt-trans> element can be different from the main source/target languages. This allows you to have proposed translation coming from a different language. For example: French (fr-FR) proposed translations could be offered when translating into French Canadian (fr-CA), and so forth.