The TTX Filter is an Okapi component that implements the IFilter interface for Trados TTX documents. TTX is an XML-based bilingual format used by some of the versions of the Trados tools, and supported by several other tools. There are no official public specifications available.
The filter decides which encoding to use for the input document using the following logic:
- If the document has an encoding declaration it is used.
- Otherwise, UTF-8 is used as the default encoding (regardless the actual default encoding that was specified when opening the document).
If the output encoding is UTF-8:
- If the input encoding was also UTF-8, a Byte-Order-Mark is used for the output document only if one was detected in the input document.
- If the input encoding was not UTF-8, no Byte-Order-Mark is used in the output document.
The type of line-breaks of the output is the same as the one of the original input.
TTX is a format where the target text cannot be represented if the text is not segmented. Thus, the output of the filter includes any new
In auto-detection mode, the filter tries to detect if the file has at least one existing segment. If one existing segment is detected only the text in the existing segments are extracted (mode 1). If no segment is detected all text is extracted (mode 2).
You can also choose to force to extract only the text in existing segments, or to force to extract all text, whether it is segmented or not.
For example, in mode 2 (extract all), the following content:
...<Raw> Part 1 <Tu MatchPercent="0"> <Tuv Lang="EN">Part 2</Tuv></Tu> Part 3. </Raw>...
will be extracted as a single text unit with three segments:
[Part 1 ][Part 2][ Part 3]
But in mode 1 (extract existing segments only), only the segmented parts are extracted:
Segmented entries may have translations. In this case the text of the target is extracted along with the source.
- If the
MatchPercentattribute of the TTX segment is above 100 and the
Originattribute is set to
xtranslate, the translation is also added as an alternate translation annotation for the extracted segment and its match type set to
EXACT_LOCAL_CONTEXT. Those entries correspond to the
XUmatches in TagEditor.
- Otherwise, if the
MatchPercentattribute of the TTX segment is above 99 the translation is also added as an alternate translation annotation for the extracted segment and its match type set to
- Otherwise, if the
MatchPercentattribute of the TTX segment is above 1 and below 100 the translation is also added as an alternate translation annotation for the extracted segment and its match type set to
In both cases, if the translated segment has a
Origin attribute, its value is carried over to the annotation.
Extraction mode — Select the type of extraction to perform, based on possible existing segments.
- Auto-detect existing segments: If at least one segment is detected, only existing segments are extracted. If no segment is detected, all text is extracted.
- Extract only existing segments: Only existing segments are extracted. If the file is not pre-segmented no text is extracted.
- Extract all: Extract all text, whether it is in existing segments or not.
Escape the greater-than characters in output — Set this option to have all greater-than characters ('>') escaped as
> in the output.
The TTX element
<df> may cause problems in some cases, for example when spanning across an external tag. For instance, when the TTX file contains the extraction of the following HTML code:
<p>text in <b>bold</p> <p>Bold</b> text<>
<df> element for bold will go across paragraph boundaries which are external tags in TTX. Those cases should be rare and you should report them.