Migration: Pick up where you left off or “start over”?

There are two essential ways to migrate a large amount of data “quickly”:

  • Migrate data until the data window closes. The next time the window opens, pick up where the migration left off.
  • Migrate data until the data window closes. The next time the window opens, start at the beginning again and check for changes.

There are pros and cons to both methods. It’s tempting to think the first method would be the fastest/best, since getting a “baseline” copy of all the data would be done first. The data set would then just have to go through some sync copies, and then you can change over to the new destination.

However, I believe the second method is better, because it can give the user “better” information after the second run about the data set being migrated. The first time the migration window runs, the files are being copied to the destination for the first time; therefore, there’s no need to check to see if any data changes have occurred. With the second method in use, when the second migration window opens, the data that already has been copied will be checked to see if the source has changed.

If the source does have changes, those changes are replicated to the destination, and then the migration starts picking up the files that have not yet been copied. Then the migration window may close again due to the amount of data that needs to be migrated.

At this point, the argument could be used that instead of checking the existing files for changes, the migration should go directly to the data that was not migrated in the first migration window. If you do this, then the user will not have any information about the possible amount of data change that the source might have experienced between migration windows.

If the amount of data change is so vast that the migration window was only able to get through the data that was migrated during the first window, then a different migration strategy may be required. For example, the user may need to break the source down into smaller chunks, with corresponding smaller migration tasks. If there is no data change detected, then the migration can quickly finish the sync check and continue to the remaining data.

Keep in mind that this strategy of using the information on the data change is only helpful if the process that checks for data change can be done quickly. StorageX is able to check for data change rapidly; in fact, tests have shown that StorageX is one of the fastest utilities for performing this check. StorageX was able to check one dataset for changes completely in just over 4 minutes. (Note that there were no changes in the source to be replicated.) When used on that same dataset, Robocopy took just over an hour…

Speak Your Mind

*