5 Ways Data Matching Is Used In MDM Implementation

Blog

5 Ways Data Matching Is Used In MDM Implementation

In his recent blog post, Henrik Liliendahl Sørensen touched on the topic of data matching. He highlighted the considerations around where data matching should be done.

I am a big proponent of avoiding duplicates by taking advantage of matching at the point of entry. But in reality, master data records get captured in different applications that are not equipped with matching or any other duplicate prevention mechanism. Not having a centralized master data management system which can address this problem is one of the key challenges organizations face today.

Once you embark on the master data journey, data matching becomes a crucial aspect and can help you at different stages of the implementation. Below, I discuss 5 stages of MDM project where data matching is used.

Initial load

As you start your MDM initiative, approach the solution by identify the sources of master data, bring at least 2 sources of data into master data hub and run data matching process. This is an important step and provides you a framework for future source system integration. It also allows you to pick right matching technique for your scenario.

Batch data loads

The ongoing integration from internal and external data sources usually requires a batch data load option. One of the approaches you can take here is to load party data into the hub and then trigger a periodic (nightly) matching process to identify duplicates. This requirement usually arrives during a consolidation and co-existence phases of MDM where you will create a golden profile for reporting and other analytical purposes.

Real time interactions

As you start moving towards transactional hub architecture, you need to turn on the real time matching and linking of records. In this phase, applications directly interact with master data hub via API’s and services. A party record flowing into MDM is matched in real time; duplicates are identified and merged into a surviving record. This is an ideal state where your party data is centralized, well maintained and acts as a foundation for things such as real time analytics.

Mergers and Acquisitions

Mergers and acquisitions bring unique challenges and matching comes in handy here again by helping you mash up newly acquired customer data with your MDM hub. Usual approach is to take a production like environment (Or M&A environment) and do multiple passes of data matching exercise to find the right approach to integration.

Searching

A new feature some of the early pioneers are doing when it comes to data matching is to provide you a Google like search capability within MDM. Here, data is matched as you type your search criteria to help you avoid duplication at point of entry. I will explore more about this in a future post.

What are the other situations you have come across where data matching is used? I would love to hear your opinions and thoughts via comments.

Image Courtesy of phanlop88/freedigitalphotos.net

COMMENTS

24 Thoughts on 5 Ways Data Matching Is Used In MDM Implementation
    FX Nicolas
    14 Nov 2014
     1:46am

    One relatively frequent pattern for using data matching is what I call a “contribution model”, which mixes “App Data Consolidation” and “User Collaboration”. This pattern applies when you want data coming from applications to merge in a consolidation hub style (initial/batch data loads). This data needs to combine with information directly authored in a centralized hub style in MDM workflows and UI. This requires data matching and merging at record and field level with complex survivorship rules (field-level, mostly).

    0
    0
    Prashant
    14 Nov 2014
     11:29am

    FX,

    Thank you for your comment. Very appropriate observation and I do see this often in implementation phases where consolidation continues to occur while master data management directly servers the pilot applications which pioneer to leverage the hub.

    Coming to merging the record with complex field level survivorship rules, its a big topic in itself. There are only few vendors who do a great job at this by providing a trust framework which is flexible and allows customers to use it in a way which best suits there specific need. I will write more on this in coming months.

    Thank You
    -Prash

    0
    0
    Andrew Meyer
    15 Nov 2014
     7:54am

    There’s another issue with data matching and linking in particular, when do you know you’re done? If you have 90M records that came from 4 or 5 different sources and you’re trying to link and/or dedup them, how often do you go through the process? Especially considering the cost, i.e. if you process in batches of 6M, it takes about 3 days to process.

    I’d be interested if you have any rules-of-thumb or if you’ve done any analysis about how to make that decision.

    0
    0
      FX Nicolas
      18 Nov 2014
       1:35am

      Good question Andrew.
      Short answer for the initial load in “binning” to “divide and conquer”. See http://www.semarchy.com/doc/SEMDG/html/Integration-Process-Design.html#Matching for more details.

      Then to avoid excessive re-processing all the records at every change, the system must be smart: The key here is to keep track of the automated matching decision taken by the system, PLUS the user choices (if the system allows manual match/merge), and be able to re-match & merge ONLY new or updated records. This is not a simple technical problem, but I think we cracked that nut.

      0
      0
    Rakesh
    15 Nov 2014
     9:27pm

    Good article. De-duplication itself is a project at many places and never ends. Please do write more about trust framework and how successful these frameworks are.

    Thanks

    0
    0
    Bob Tapscott
    18 Nov 2014
     6:04am

    This is always one of the challenges of mergers and acquisitions and developing an ecosystem of companies where connecting disparate yet potentially duplicate data sources is critical. The sensitivity of the data need be taken into consideration. If for example you have a high probability that you have duplicate records, relationships and volume discounts for the same provider than by all mean try to renegotiate one (larger) volume discount. Part of the business case for MDM. However with customers it is different. For example when one bank recognized in their merged CIF identical names, dates of birth, and SSN numbers but different addresses for seven accounts it seemed sensible that the one outlier address was outdated and a combed CIF and statement made sense to demonstrate the importance to the bank of the overall relationship with the customer. . . Unfortunately the one outlier was a charge card for the mistress and combining its statement with the wife’s statement ended up in a legal quagmire and a lost customer relationship. With duplicate customer data it is often smart to incentivize the customer to connect the disparate dots.

    0
    0
    Ankit Tripathi
    19 Nov 2014
     2:56am

    One method of MDM implementation you touched was Searching, Cloud MDM tools give us the perfect way to search for duplicates at the point of entry.

    It basically recognizes our Matching settings for the batch job to identify the duplicate and puts that logic into Search of records and gives potential duplicate at the Point of Entry. One more important thing to consider here is we can actually toggle between the matching settings we would use for searching and the one which we would use for finding duplicates in the system to merge. Looking forward for more cloud based MDM implementation information.

    0
    0
      Adam Hintz
      21 Nov 2014
       3:26am

      Ankit,

      Why would cloud MDM (as opposed to on premise) offer the perfect way to search at the point of entry?

      I’m not sure how location of the data is relevant as long as you’re searching a master list at the time of creation.

      Regards,
      -Adam

      0
      0
    Bob Orf
    5 Dec 2014
     5:50am

    We recommend that a full batch process be run annually due to changes in address coding guides. For example, a zipcode change may cause a split if the old zip resides in the data warehouse and the new zip appears on an incoming record. Many times, we also recommend an annual NCOA process to run concurrent to this to avoid a similar problem. An old address in the database that will not match to a new address for the same person on an incoming feed. Of course if the source has a customer number, then the data can be changed but in most instances, there are external feeds that will not.

    0
    0
    Linda Boudreau
    23 Apr 2015
     8:10pm

    Good post. There is so much great work being done with data matching tools in various industries such as financial services and health care, especially with data-driven decision making becoming a bigger trend. It will be interesting to see the impact of these changes down the road.

    Linda Boudreau
    Data Ladder

    0
    0
    Fuzzy Matching for Oil/Gas MDM using Python | Adventures in Python, Petroleum Data & GIS
    10 Jun 2015
     9:46pm

    […] Data Management tools on the market, you will find that most of them provide functionality for matching records from various source systems and blending or merging them into a single, trusted data set, a practice also dubbed “Golden […]

    0
    0
    Ravi
    18 Oct 2016
     2:52am

    Really useful inputs. Thanks for keeping Data matching scenarios together.

    0
    0
    Naeem Saif
    7 Nov 2016
     12:42am

    Hey,
    Sir i am a beginner in MDM. I am using Informatica Hub.
    I have gone through various documents and videos and now i have a good idea of what is the purpose is.
    I have performed the following process,
    Created Landing Tables, Base Objects, Staging Tables, Mapping , Cleanse functions ,
    stage and load process.
    Now i want to do match merge and tokenize, which im not able to understand that how it is done, I am trying by and am not able to do so as i dont have that much knowledge of Informatica hub or how these process work.
    Any guidance or help would be appreciated.
    Thanks

    0
    0
    Nicolas
    20 Dec 2016
     5:36am

    Hi,
    We are using MDM in our bank project, it’s totally outdated and inappropriate framework, even for matching functions…
    All team members only wish to get out of MDM now!

    0
    0

Leave A Comment

RECENT POSTS

Businex-Blog

Composable Applications Explained: What They Are and Why They Matter

Composable applications are customized solutions created using modular services as the building blocks. Like how...

Businex-Blog

Is ChatGPT a Preview to the Future of Astounding AI Innovations?

By now, you’ve probably heard about ChatGPT. If you haven’t kept up all the latest...

Businex-Blog

How MDM Can Help Find Jobs, Provide Better Care, and Deliver Unique Shopping Experiences

Industrial data is doubling roughly every two years. In 2021, industries created, captured, copied, and...