TTX Filter

From OkapiWiki

Jump to: navigation, search

Contents

Overview

The TTX Filter is an Okapi component that implements the IFilter interface for Trados TTX documents. TTX is an XML-based bilingual format used by some of the versions of the Trados tools, and supported by several other tools. There are no official public specifications available.

Processing Details

Input Encoding

The filter decides which encoding to use for the input document using the following logic:

Output Encoding

If the output encoding is UTF-8:

Line-Breaks

The type of line-breaks of the output is the same as the one of the original input.

Segmentation

TTX is a format where the target text cannot be represented if the text is not segmented. Thus, the output of the filter includes any new <Tu> and <<Tuv> needed.

In auto-detection mode, the filter tries to detect if the file has at least one existing segment. If one existing segment is detected only the text in the existing segments are extracted (mode 1). If no segment is detected all text is extracted (mode 2).

You can also choose to force to extract only the text in existing segments, or to force to extract all text, whether it is segmented or not.

For example, in mode 2 (extract all), the following content:

...<Raw>
Part 1 <Tu MatchPercent="0">
<Tuv Lang="EN">Part 2</Tuv></Tu> Part 3.
</Raw>...

will be extracted as a single text unit with three segments:

[Part 1 ][Part 2][ Part 3]

But in mode 1 (extract existing segments only), only the segmented parts are extracted:

[Part 2]
Note: Segmentation is often a cause for interoperability issues. For a better compatibility with the tool that created the TTX files it is recommended to work with pre-segmented documents.

Existing Translations

Segmented entries may have translations. In this case the text of the target is extracted along with the source.

In both cases, if the translated segment has a Origin attribute, its value is carried over to the annotation.

Parameters

Extraction mode — Select the type of extraction to perform, based on possible existing segments.

Escape the greater-than characters in output — Set this option to have all greater-than characters ('>') escaped as &gt; in the output.

Limitations

The TTX element <df> may cause problems in some cases, for example when spanning across an external tag. For instance, when the TTX file contains the extraction of the following HTML code:

<p>text in <b>bold</p>
<p>Bold</b> text<>

The <df> element for bold will go across paragraph boundaries which are external tags in TTX. Those cases should be rare and you should report them.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox