HTML5-ITS Filter
Overview
This filter allows you to process HTML5 documents. Those documents may contain ITS 2.0 markup.
The input documents are expected to be valid HTML5.
Processing Details
Input Encoding
The filter decides which encoding to use for the input document using the detection mechanism defined by the specification.
Output Encoding
The output encoding is the same as the input encoding, except if defined otherwise by the calling tool.
Line-Breaks
The type of line-breaks of the output is the same as the one of the original input.
Quote Mode
Escaping of quote and apostrophe (single quote) characters can be changed by adding these lines to the config file:
quoteModeDefined=true quoteMode=3
Current quote modes:
- Do not escape single or double quotes: UNESCAPED = 0
- Escape single and double quotes to a named entity: ALL = 1
- Escape double quotes to a named entity, and single quotes to a numeric entity: NUMERIC_SINGLE_QUOTES = 2
- Escape double quotes only: DOUBLE_QUOTES_ONLY = 3
Parameters
ITS Support
By default the filter process the HTML5 documents based on the ITS defaults. That is:
- The
lang
attribute is used as the local markup for the Language Information data category. - The
id
attribute is used as the local markup for the Id Value data category. - Most of the phrasing content elements are interpreted as
withinText="yes"
for the Element Within Text data category. - The
translate
attribute is used as the local markup for the Translate data category, and the behavior for that data category is different from the one in XML See the HTML5 definition for details.
Default behavior can be overridden when the input document contains ITS markup, or if a filter parameters file is specified. The parameters file used by the filter is an ITS document.
The Internationalization Tag set (ITS) is a W3C specification that defines a set of elements and attributes you can use to specify different internationalization- and localization-related aspects of your XML and HTML5 document, for instance: ITS allows you to define what element should be treated as a nested sub-flow of text, what element denotes a term, how to identify the language of a content, and much more.
The filter supports ITS 2.0.
- The ITS 2.0 specification is available at http://www.w3.org/TR/its20/.
See the "ITS" page for more details on the format.
The filter supports global and local rules and most data categories. See the ITS Components page for a detailed list of how the data categories are supported and other information on the implementation.
Limitations
- This filter is BETA.