Difference between revisions of "Correction Modules"

From reBiND Documentation
Jump to: navigation, search
(Introduction and ElementTextReplacer)
 
m (ElementTextReplacer: formatting)
Line 24: Line 24:
 
== ElementTextReplacer ==
 
== ElementTextReplacer ==
 
''What it does'': Replaces the text content of specific elements according to specific rules.  
 
''What it does'': Replaces the text content of specific elements according to specific rules.  
 +
 
''Full Name'': <code>org.bgbm.rebind.correction.modules.ElementTextReplacer</code>
 
''Full Name'': <code>org.bgbm.rebind.correction.modules.ElementTextReplacer</code>
 +
 
''Settings'':  
 
''Settings'':  
 
: '''address'''
 
: '''address'''
Line 65: Line 67:
 
:: ''Example Values'': <code>,</code> or <code>\.</code>
 
:: ''Example Values'': <code>,</code> or <code>\.</code>
  
''Examples''
+
''Examples'':
 
<syntaxhighlight>
 
<syntaxhighlight>
 
     <module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="replaces 'Specimen'">
 
     <module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="replaces 'Specimen'">

Revision as of 11:08, 26 July 2012

Modules for the XML Correction Manager

When the user starts the correction of an XML file, a specific correction configuration has to be selected. The web interface of the Correction Manager will offer a list of all correction config files in the correction config directory. A correction config file, is an XML file which is valid to a specific schema. It specifies what correction modules will be called, in what order and with what parameters. The structure of such a file is relatively simple:

<modules xmlns="http://rebind.bgbm.org/modules" name="configuration-name">
    <module name="module-name" description="module-description">
        <setting name="module-setting-name" value="module-setting-value"/>
    </module>
    <setting name="general-setting-name" value="general-setting-value"/>
</modules>

The root element is modules. The name attribute is optional. It is the name with which the module will be displayed in the web interface of the Correction Manager.

There can be several module elements within the element modules.

Each module element must have a name attribute which specifies the name of the module to be loaded. This should either be the complete name of a Java class which implements the Module interface or the name specified within the method getName() of that class.

The description is optional and is used to distinguish different instances of the same module which are run with different settings.

Each module element can have any number of setting elements. Each setting element has mandatory attributes for the setting name and value. What settings are used for each module is specified in the module descriptions below.

The modules element can also have setting elements. These general settings are also accessible by the module and are overwritten if a setting with the same name is specified in the module element.

ElementTextReplacer

What it does: Replaces the text content of specific elements according to specific rules.

Full Name: org.bgbm.rebind.correction.modules.ElementTextReplacer

Settings:

address
The name (including element prefix) of the element whose text should be replaced. Or an XPath expression pointing to the element. If the value is interpreted as a name or as an XPath depends on the attribute isXPath.
Mandatory: yes
Example Values: abcd:Sex or //abcd:RecordBasis[matches(.,'^Specimen$')]
isXPath
A flag indicating if the address element contains an XPath expression or just the name of an element.
Mandatory: no
Default Value: false
Allowed Values: true or false
key
The part of the content that should be replaced. This could either be plain text or a RegEx, depending on the attribute isRegEx. Regardless whether it is plain text or an attribute, it could several keys to be replaced or just one, depending on the attribute isBatch. If the batch mode is used, the character or string with which the different parts are separated can be specified in the attribute splitter.
Mandatory: yes
Example Values: Hello World (plain text), (H[ea]llo World)(\!?) (RegEx), Hello World;Lorem Ipsum (plain text, Batch mode with ';' as splitter), H[ea]llo World\!?;[Ll]orem [iI]psum (RegEx, Batch mode with ';' as splitter),
value
The new content with which the content specified in key will be replaced. This could either be plain text or a RegEx, depending on the attribute isRegEx. Regardless whether it is plain text or an attribute, it could several keys to be replaced or just one, depending on the attribute isBatch. If the batch mode is used, the character or string with which the different parts are separated can be specified in the attribute splitter. If the batch mode is used, the key fragments will be replaced by the corresponding value fragments (e.g. the third key fragment will be replaced by third value fragment). Therefore the number of fragments must be the same for key and value, otherwise the replacement will stop after the number of fragments in the smaller one.
Mandatory: yes
Example Values: Hello World again (plain text), $1 again $2 (RegEx), Hello World again;Lorem ipsum dolor sit amet (plain text, Batch mode with ';' as splitter), $1 again $2;$& dolor sit amet (RegEx, Batch mode with ';' as splitter),
isRegEx
A flag indicating if the key and the value elements are regular expressions or just plain text.
Mandatory: no
Default Value: false
Allowed Values: true or false
isBatch
A flag indicating if the key and the value elements contain just one fragment which is supposed to be replaced, or several. If it is true, the character or string with which the different fragments of key and the value elements are separated can be specified in the attribute splitter.
Mandatory: no
Default Value: false
Allowed Values: true or false
splitter
The character or string with which the key and the value elements are broken into their fragments, if they are in batch mode. The splitting is done using the function String.split(String), which interprets the parameter string as a regular expression. This could cause errors when the splitter contains characters with syntactical meaning in RegEx, like <setting name="splitter" value="."/> which would cause any character to be matched and therefor only returning empty fragments.
Mandatory: no
Default Value: ;
Example Values: , or \.

Examples:

    <module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="replaces 'Specimen'">
        <setting name="address" value="//abcd:RecordBasis[matches(.,'^Specimen$')]"/>
        <setting name="isXPath" value="true"/>
        <setting name="key" value="^(Specimen)$"/>
        <setting name="value" value="Preserved$1"/>
        <setting name="isRegEx" value="true"/>
        <setting name="isBatch" value="false"/>
        <setting name="splitter" value=";"/>
    </module>
    <module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="corrects abcd:Sex">
        <setting name="address" value="abcd:Sex"/>
        <setting name="isXPath" value="false"/>
        <setting name="key" value="female;male;hermaphrodite"/>
        <setting name="value" value="F;M;X"/>
        <setting name="isRegEx" value="false"/>
        <setting name="isBatch" value="true"/>
        <setting name="splitter" value=";"/>
    </module>
    <module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="corrects abcd:Rank">
        <setting name="address" value="abcd:Rank"/>
        <setting name="isXPath" value="false"/>
        <setting name="key" value="[f.];[subvar.];[var.]"/>
        <setting name="value" value="f.;subvar.;var."/>
        <setting name="isRegEx" value="false"/>
        <setting name="isBatch" value="true"/>
        <setting name="splitter" value=";"/>
    </module>
    <module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="corrects abcd:Rank">
        <setting name="address" value="//abcd:Rank[matches(.,'^(f|var)$')]"/>
        <setting name="isXPath" value="true"/>
        <setting name="key" value="^f$;^var$"/>
        <setting name="value" value="f.;var."/>
        <setting name="isRegEx" value="true"/>
        <setting name="isBatch" value="true"/>
        <setting name="splitter" value=";"/>
    </module>