Identifying the Right Sources of Master Data

Identifying the Right Sources of Master Data

Among several challenges faced when we kick start an MDM implementation is the step to determine which source to consider for initial phase of deployment. Amidst all crucial aspects such as data collection, data transformation, normalization, standardization, matching etc, this step of source identification is critical factor for realizing MDM benefits early on.

The proven process to implement MDM is to start with small set of data sources and grow incrementally. Once we identify the sources having correct entities, dependent domains and attributes, we can do an effective ground work for

[icon_list style=”arrow-2″]

Creating broad set of rules to cleanse the data
Building standardization engines applicable to all relevant data entities and
Constructing rules to identify suspects so as to create single version of truth (As discussed in my earlier post)

[/icon_list]

Getting things straight at the beginning is critical aspect of the MDM project as it acts as a foundation for future source system integration plans. This also allows us to accomplish easier enterprise wide MDM roll out by adding additional sources of data to MDM hub.

So, the question is how to choose the sources which will get into MDM during this inaugural phase considering the organizations will have huge application landscape and will not know which systems are responsible for which master data. This is also a very revealing act for many of customer representatives themselves when they find dozens of databases containing data which they did not know existed.

Depending on the master domain you are implementing you would usually start by listing down the most trusted data sources the company currently uses for its customer facing applications. So, for example if you are implementing customer master, you will ask, which system currently manages customer name, their current address and contact information? It’s easier said than done though as you will find the organization indeed has multiple silo applications all having this information for a specific line of business. Each division, department and business process has customer information which is complete as per the corresponding business owners.

One of the strong belief in our MDM arena is, larger the data, larger the data quality issues and even larger are the duplicate records. Put in a nut shell, we would usually choose the data sources which own maximum number of customer records. This gives us an option to set up rules as accurate and generic as possible so a wider set of data issues can be addressed upfront.

[pullquote_right]
Using data profiling tools is a great way of scanning data for missing values, incorrect values and elements violating business rules.
[/pullquote_right]

Also, remember that you’ll need as much information as possible to do an adequate data matching. So emphasis on completeness of these attributes and the source you choose should have these attributes densely filled. To help you discover more about the source data, you will need a quick initial profiling phase to take certain decisions. Data profiling tools help in scanning data for missing values, incorrect values and elements violating business rules. This will allow you to make better effort estimation for clean up work required. Profiling will also help you to carefully weigh each source and judge whether it is reliable source of master data.

How do you analyze the data? And how do you determine the correct sources of master data? Please share your experience and opinions via comments. Thank you.

COMMENTS

21 Thoughts on Identifying the Right Sources of Master Data

Henrik Liliendahl Sørensen

27 Jan 2012

8:46am

Prash, selecting the sources is indeed very essential.

One aspect I have been working with a lot is how to involve external sources as well.

In the customer data arena this will be things as address directories (as we also discussed earlier here on the blog related to geocoding), business directories for B2B data and consumer/citizen directories for B2C very much dependent on the countries and industry in question.

These sources may be very helpful within standardization and data matching and including touch sources in future data entry will have a great impact on data quality if you are able to include this into your business processes.

Show Replies

Josh Buckler

27 Jan 2012

11:48am

Hey Prashant,

Selecting the best source of data to start with is a recurring challenge for many of our customers. Henrik brings up a great point that some external sources can provide an initial source of truth like Dun and Bradstreet with B2B data for example. The bigger challenge is that regardless of the accuracy of the external source, that data will still need to match up with the potential mess you’ll be dealing with internally. Ex. How easily can you link helpIT systems with helpIT, helpIT inc, helpIT systems inc, or HSI?

I mentioned in a blog post I wrote about the Retail Single Customer View (http://www.helpit.com/cleandata/?p=138) that a customer of ours selected their website data as a starting point due to the fact that the customers would care most about receiving an order so it is in their own best interest to provide accurate information or the customer will have to deal with logistics headaches down the road.

It’s even possible with some software, including helpIT’s applications, to score the quality of the information within each record based on completeness and accuracy.

So may be answer is a combination of identifying the source that cares most about accuracy of information with a record quality scoring methodology.

Show Replies

Recent Reads: MDM & Data Governance | Mastering Data Management

27 Jan 2012

12:39pm

[…] Identifying the Right Source of Master Data: Our own @MDMGeek talks about one of the crucial first steps of any MDM implementation: determining which sources of master data to include. […]

John Owens

30 Jan 2012

9:13pm

Hi Prashant

Thanks for the post. However, I have several very serious concerns about the overall approach you are advocating.

The first of these is that you at no point mention the MOST CRITICAL element for Master Data Definition and Management, which is the LOGICAL DATA MODEL (LDM). If you have not got this you cannot be said to me managing you master data. In fact, it would be impossible. It woul be like claiming that you could manage the electrics in a large building without having a wiring diagram.

Secondly, Master Data Elements must be DEFINED by senior management, they cannot be inferred from existing data. What an enterprise is currently categorising and grouping its data as may be right or it may be very wrong. What it OUGHT to be cannot be inferred from the data itself. It must be defined and this definition will be shown in the LDM.

Thirdly, normalising existing data is a laborious, archaic and error prone activity that should be avoided at all costs. This is a thoroughly outdated excercise called Relational Data Analysis (RDA), that I used lecture on 20 years ago, that has been totally superseded by the Relational Data Model.

If those practising Master Data Management within an enterprise are to be taken seriously then they must be seen to operating at the highest level of quality, using all of the very best techniques. They cannot be seen as a center of excellence if they are leaving out vital elements, such as the LDM, and using a flawed techniques such as RDA.

Regards
John

Show Replies

John Owens

31 Jan 2012

4:53am

Hi Prashant

Thanks for the feedback and the context.

I agree that once the the LDM is in place that you can cross check with existing data to see if you have missed any.

However, I would strongly suggest that you always normalise in the LDM and then map all of your existing data onto that.

A properly drawn LDM will be fully normalised to 5NF.

Once again, thanks for the feedback.

Kind regards
John

Tom

8 Feb 2012

1:07pm

Interesting – thanks.

A good post about the importance of a single customer view on this site.

Thanks again,
Tom

Google

8 Feb 2012

4:33pm

Google…

[…]The facts talked about inside the article are some of the most effective readily available […]…

Identifying the Right Sources for MDM | Data Daily | DATAVERSITY

28 Jul 2012

8:11pm

[…] recent article examines best practices for identifying the right sources for master data. It begins, “Among several challenges faced when we kick start an MDM implementation is the step […]

Simplifying Master Data Management – MDM – A Geeks Point Of View

30 Jul 2012

10:19am

[…] Many times we have to custom fit the solution to meet specific organization’s requirements. Identifying different master data elements and modeling them in an efficient manner is one such key aspect. I see lot of organizations […]

Newport Beach massage for you

19 May 2013

9:27am

I’m really enjoying the theme/design of your blog. Do you ever run into any web browser compatibility problems? A couple of my blog readers have complained about my website not operating correctly in Explorer but looks great in Safari. Do you have any advice to help fix this problem?

life insurance San Diego

12 Jun 2013

2:55pm

If some one needs expert view on the topic of running a blog then i propose him/her to visit
this web site, Keep up the fastidious job.

Show Replies

Marketing your MDM Solution Within Your Organization – MDM – A Geeks Point Of View

9 Jul 2013

9:23pm

[…] implementations start with identifying the right source of master data and centralizing it. In this process, we also build rules for standardizing and enhancing the […]

Understanding the Sources of Master Data – MDM – A Geeks Point Of View

13 Aug 2013

10:12am

[…] an earlier post, I discussed about how to identify the right sources of master data during an MDM implementation. I argued that this step is critical factor for realizing MDM benefits […]

Eenie, Meanie, Mindie Your MDM Sources | The Data Roundtable

21 Aug 2013

9:01am

[…] Prashanta Chandramohan (aka the MDM Geek when his party role is blogger) recently blogged about Identifying the Right Sources of Master Data, which made me think that “eenie, meanie, mindie your MDM sources” would make a great counting […]

Eenie, meanie, mindie your MDM sources - Information Architect

5 Sep 2013

2:24pm

Key Questions to Ask During Master Data Consolidations - MDMGeek.comMDM – A Geeks Point Of View

10 Mar 2014

10:17am

[…] an earlier post on this blog, I examined the ways in which we can identify the right sources of Master Data. Once these data sources are identified, next step is to select the right data elements from them, […]

5 Ways Data Matching Is Used In MDM ImplementationMDM – A Geeks Point Of View

13 Nov 2014

11:10pm

[…] you start your MDM initiative, approach the solution by identify the sources of master data, bring at least 2 sources of data into master data hub and run data matching process. This is an […]

MDM – A GEEK'S POINT OF VIEW