Creating Matching Mechanisms

Matching rules are used to search for duplicate data by certain attributes and to form clusters with duplicates. Matching rules have flexible configuration of items to be matched, the ability to select the source of information and algorithms for data comparison.

Records can be matched by attributes of the first level (simple and code types).

Clusters of duplicate data model objects are formed according to the rule settings.

The list of clusters and their contents can be viewed in the Duplicates section of data steward interface.

The contents of clusters is updated when you save changes/delete a record in real time (depending on which pipelines are configured) or when you start the operation of data reindexing.

Also see: Searching Duplicates Concept

Note

In the current implementation, matching and merging of the hierarchical reference sets records is not available

Configuring Matching Rules

To match records by attributes:

  1. Create a matching table. In the parameters of the matching columns, in the Type field, select the attribute type.

  2. Create a matching rule. Specify the algorithm:

    • "Exact Matching" and enable registry-independent search if necessary.

    • "Inexact Matching" for matching by inexact matching of values. Select language, set the percentage of similarity and select the concatenation type.

  3. Create a rule set. In the appropriate fields, select the previously created table and matching rule.

  4. Configure an assignment of rules. Select the previously created matching table and then select the required attribute. Enable autoconsolidation of records if necessary.

  5. The action will create the necessary record matching rules in the system. Next, configure the :ref::launch of duplicate search <matching_launch>.

Launching Matching Rules

To start searching for duplicate records, data matching rules must be created in the system (see above), as well as duplicate search mechanisms must be configured.

After correct configuration, it becomes available to view duplicate records as clusters in "Duplicates" section .