Database Sync is Hard

"Some problems are hard. Really hard. So hard that the slightest thought of wrestling with said problem causes cold sweats, sleepless nights and panic attacks. Data synchronization, for me, is one of those problems.

Just to be crystal clear, I'm not talking about configuring MySQL / Oracle replication, or a simple one-way sync of some data for use offline. I'm talking about true multi-master replication with potentially hundreds of systems.

These remote databases are not always connected to the home office, will be edited independently, and need to be able to synchronize all of their changes with a central database while at the same time receiving changes submitted by other remote systems," wrote Nathan McMinn, who was relieved when he finally found and implemented Pervasync.

Read full story

What's Wrong with Traditional Change Capture

In data synchronization, you track the changes to records on the originating DB and apply them on the destination DB.

This is commonly done by adding a timestamp column to your table. You update the timestamp column whenever you change a row. At sync time, a sync client would come in with a last sync timestamp and you select all the server changes that have newer timestamps than the last sync timestamp.

This simple approach has a serious flaw. We know that changes in a database transaction are not visible to others until a commit is done. Let’s say at time T1 you modified a record R1. The timestamp column would have a value of T1. Then before you commit, someone synced at T2 (T2 > T1). This sync could not pick up the changes to record R1 since the transaction was not committed yet and the change was invisible. At T3 you committed the transaction and later at T4 (T4>T3>T2>T1) you do another sync. Guess what, you still won’t get the changes to R1! It was committed but it has a timestamp T1 that is older than last sync time T2. The client will forever miss the change no matter how many times you sync.

It's a timing issue. This traditional change capture approach may work 99% of the time. But you may not be able to afford missing one percent of the changes. You may have to periodically do a full refresh to pick up the missed ones. Not cool!

With Pervasync's innovative two-phase change capture approach, you can rest assured that no changes are lost.

What about Data Subsetting

When you sync data from a central server to multiple clients, very often, you don't want a full replica. For example, you may want users to only get their own orders but not others'.

Most sync frameworks allow you to set filters based on column values. For our example, you may set a filter criteria on the "USER" column of the "ORDERS" table to match the sync user.

But what about the "ORDER_ITEMS" table which does not have a "USER" column? In database systems, you are used to using sub-queries like "SELECT * FROM ORDER_ITEMS WHERE ORDER_ID IN (SELECT ID FROM ORDERS WHERE USER=:SYNC_USER)" to slice and dice data. But no sync frameworks support sub-queries that involve multiple tables for data subsetting. That is, except for Pervasync.

Conflict Detection and Resolution

When thinking of synchronization, people get nervous about the possibility of stale changes overwriting new data. You could detect this conflict by marking each record with a version number. If the version numbers have changed when you apply a change, a conflict is detected. You then resolve the conflict.

Nobody wants to deal with dreaded conflicts. Pervasync provides strategies that you can use to prevent the conflicts from happening in the first place.

Real world synchronizations are often plagued with fake conflicts. You sync changes to server and the changes are applied. However, because of a network error, you didn't receive the confirmation. You would resend the changes in next sync, causing fake conflicts. Again, Pervasync has measures handing issues like this so you are left with no worries.

Report abuse