Data Migration from MDM 5.x to 6.x (migrateDataJob)¶
Operation for data migration from MDM 5.x to 6.x.
Operation Parameters¶
User account name (text field). The login of the account used for the operation.
Database URL (text field). URL of the database to transfer.
Migrate all data revisions (checkbox). If enabled, all data (origin_victory) on links and records is loaded, except for statuses. If it is disabled, then only current data is loaded, i.e. those that fall into the active zone of the periods of relevance.
Block size (integer). The size of the block of data to be loaded. Default is 512.
Notes:
The operation transfers only DB objects.
Records are not saved to the index, so after the transfer, you need to start the reindex operation (reindexDataJob).
The operation takes all the data at each start. You cannot run it again, otherwise the second run will give errors.
If there are keys in 6.x that interfere with insertion, then errors may occur.
After migration, make sure that the system data sources match in 5.x and 6.x. A match is necessary for successful reindexing of migrated data.
"Block Size" (blockSize) Parameter Description¶
The entire number of processed records is divided into parts by the blockSize of records.
Then, in each part, one thread is processed by com.unidata.mdm.job.migrate.data.commit.interval
records (information on this number of records is stored in memory, when moving to the next records, the memory is cleared) until the records run out.
Parameter com.unidata.mdm.job.migrate.data.commit.interval
as a rule, it does not need editing. The recommended value of 1024 is sufficient for most tasks. The larger this parameter is, the more memory can be used at one time. If this parameter is greater than blockSize, then in fact this parameter will be equal to blockSize.
com.unidata.mdm.job.migrate.data.threads
- the number of simultaneously processed threads.
Parameter com.unidata.mdm.job.migrate.data.commit.interval
and com.unidata.mdm.job.migrate.data.threads
are set in backend.properties.
Thus, you should choose com.unidata.mdm.job.migrate.data.threads
by the number of logical processor cores (use an equal or smaller number, depending on whether there is another load on the processor).
When specifying a small blockSize, it is easier to track the progress of the operation through the UI. (startup manager > select startup > number of steps completed). From a performance point of view, it is better to use a sufficiently large blockSize so that the number of migrated records is approximately equal to N * blockSize * com.unidata.mdm.job.migrate.data.threads, where N is not too large a natural number, for example, 1.
If blockSize too large (for example, 500000), then part of the data may not be recorded, but the operation will be completed successfully.
The block Size setting is necessary to balance the amount of data being processed and the number of threads. It's bad when a lot of threads are created, and it's just as bad when 1 thread processes too much data at once. Therefore, it is advisable to choose average values based on the available server resources.
Also, blockSize must be selected according to the total amount of data so that the number of partitions is not too large. On such big data, as in the directory with addresses, the best option is 500-2000 partitions.
Data processing occurs sequentially: records > connections > classifiers > business processes > matching. First, processing of one data type is completed, then the transition to another occurs. The data types that are available are processed.