How transliteration works

Transliteration searches for attribute values when entering a query by transliteration. Works in simple string and array attributes. At the same time, the search for the attribute in Cyrillic also works.

How transliteration occurs:

Filtering is performed using analysis-ice for OpenSearch via the ICU library (International Components for Unicode) - icu.unicode.org.

The following text transformation rule is used (description of the rules - https://unicode-org.github.io/icu/userguide/transforms/general/): Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC

You can check the transliteration of the text here: https://icu4c-demos.unicode.org/icu-bin/translit

  • Insert the above rule into the Compound 1 field.

  • Enter the text you are interested in in the Input field.

  • Click Transform – the transliterated text will be displayed in the Output 1 field.

Cyrillic to Latin transliteration table

Cyrillic

Latin

Cyrillic

Latin

Cyrillic

Latin

а

a

м

m

щ

s

б

b

н

n

ъ

"

в

v

о

o

ы

y

г

g

п

p

ь

'

д

d

р

r

э

e

е

e

с

s

ю

u

ё

e

т

t

я

a

ж

z

у

u

з

z

ф

f

и

i

х

h

й

j

ц

c

к

k

ч

c

л

l

ш

s

Selection of the sorting method

When sorting by attribute with transliteration, by default sorting occurs by the original value. Optionally, a selection of sorting by transliterated value is available (in the attribute settings in the data model).

Impact on comparison

The original (not transliterated) attribute values are used for matching.

Influence on the uniqueness of attributes

To check the uniqueness of attributes, the original (not transliterated) attribute values are used.

Enabling the transliteration support option on existing attributes

When this option is enabled on existing attributes, it is necessary to perform a reindexing operation with data cleaning and updating mappings on the affected registries/directories (i.e., when this option is changed, the registry attribute is on this registry; the directory attribute is on this directory; the link attribute is on registries at both ends of the link; the attribute of a nested object – on all registries that use this nested object).