In his recent blog post, Henrik Liliendahl Sørensen touched on the topic of data matching. He highlighted the considerations around where data matching should be done.

Matching-ScenariosI am a big proponent of avoiding duplicates by taking advantage of matching at the point of entry. But in reality, master data records get captured in different applications that are not equipped with matching or any other duplicate prevention mechanism. Not having a centralized master data management system which can address this problem is one of the key challenges organizations face today.

Once you embark on the master data journey, data matching becomes a crucial aspect and can help you at different stages of the implementation. Below, I discuss 5 stages of MDM project where data matching is used.

Initial load

As you start your MDM initiative, approach the solution by identify the sources of master data, bring at least 2 sources of data into master data hub and run data matching process. This is an important step and provides you a framework for future source system integration. It also allows you to pick right matching technique for your scenario.

Batch data loads

The ongoing integration from internal and external data sources usually requires a batch data load option. One of the approaches you can take here is to load party data into the hub and then trigger a periodic (nightly) matching process to identify duplicates. This requirement usually arrives during a consolidation and co-existence phases of MDM where you will create a golden profile for reporting and other analytical purposes.

Real time interactions

As you start moving towards transactional hub architecture, you need to turn on the real time matching and linking of records. In this phase, applications directly interact with master data hub via API’s and services. A party record flowing into MDM is matched in real time; duplicates are identified and merged into a surviving record. This is an ideal state where your party data is centralized, well maintained and acts as a foundation for things such as real time analytics.

Mergers and Acquisitions

Mergers and acquisitions bring unique challenges and matching comes in handy here again by helping you mash up newly acquired customer data with your MDM hub. Usual approach is to take a production like environment (Or M&A environment) and do multiple passes of data matching exercise to find the right approach to integration.


A new feature some of the early pioneers are doing when it comes to data matching is to provide you a Google like search capability within MDM. Here, data is matched as you type your search criteria to help you avoid duplication at point of entry. I will explore more about this in a future post.

What are the other situations you have come across where data matching is used? I would love to hear your opinions and thoughts via comments.

Image Courtesy of phanlop88/