TMX Filter

From OkapiWiki

Jump to: navigation, search

Contents

Overview

The TMX Filter is an Okapi component that implements the IFilter interface for TMX (Translation memory eXchange) documents. The filter is implemented in the class net.sf.okapi.filters.tmx.TmxFilter of the library.

TMX is a LISA Standard that defines a file format for transporting translation memory data from one translation tool to another. The TMX 1.4b specification is at http://www.gala-global.org/oscarStandards/tmx/tmx14b.html

Processing Details

Input Encoding

The filter decides which encoding to use for the input document using the following logic:

Output Encoding

If the output encoding is UTF-8:

Line-Breaks

The type of line-breaks of the output is the same as the one of the original input.

Parameters

Read all target entries — Set this option to read all target <tuv> elements into the text unit. Otherwise only the selected target is read and all remaining ones become part of the skeleton. Default is True. Any effect this setting has depends on the following pipeline steps and the ability they have to process multiple targets.

Group all document parts skeleton into one — Set this option to consolidate the skeleton parts and send fewer events through the pipeline. Default is True. This is sufficient in most cases but as a pipeline developer sometimes you might want to have access to more fine-grained resources in the pipeline.

Exit when encountering invalid <tu>s — By default invalid <tu>s are skipped along with warning message(s). By using this default setting or ignoring the warning messages you might run the risk of getting a processed file that doesn't match the input file. Check this box if you want to be notified immediately of invalid content and want to correct the file before re-running it.

Creates or not a segment for the extracted <tu> — Use this option to set create a segment or not for each extracted <tu> entry. The following options are available:

Escape the greater-than characters — Set this option to have all greater-than characters ('>') escaped as "&gt;" in the output.

Duplicate property value separator string — This string will be used to separate duplicate property values. Default is ", "

Limitations

The <sub> element is not supported. When such element is found, a warning is issued, and the element content is put with the content of its parent element.
The filter is not able to reconstruct any DTD declaration.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox