HTML5-ITS Filter

From OkapiWiki

Jump to: navigation, search



This filter allows you to process HTML5 documents. Those documents may contain ITS 2.0 markup.

The input documents are expected to be valid HTML5.

Processing Details

Input Encoding

The filter decides which encoding to use for the input document using the detection mechanism defined by the specification.

Output Encoding

The output encoding is the same as the input encoding, except if defined otherwise by the calling tool.


The type of line-breaks of the output is the same as the one of the original input.

Quote Mode

Escaping of quote and apostrophe (single quote) characters can be changed by adding these lines to the config file:


Current quote modes:


ITS Support

By default the filter process the HTML5 documents based on the ITS defaults. That is:

Default behavior can be overridden when the input document contains ITS markup, or if a filter parameters file is specified. The parameters file used by the filter is an ITS document.

The Internationalization Tag set (ITS) is a W3C specification that defines a set of elements and attributes you can use to specify different internationalization- and localization-related aspects of your XML and HTML5 document, for instance: ITS allows you to define what element should be treated as a nested sub-flow of text, what element denotes a term, how to identify the language of a content, and much more.

The filter supports ITS 2.0.

See the "ITS" page for more details on the format.

The filter supports global and local rules and most data categories. See the ITS Components page for a detailed list of how the data categories are supported and other information on the implementation.


Personal tools