Typical master data consolidation starts with combining the operational master records from all the data silos where they exist. The key aspect being, creation of master data indexes to support single view; knowing and asking right questions during this phase can save lot of time and rework.

In an earlier post on this blog, I examined the ways in which we can identify the right sources of Master Data. Once these data sources are identified, next step is to select the right data elements from them, which confront to the definition of master data.Master Data Consolidation Magic

Some of the characteristics of this exercise are –

  1. Finalizing the core data elements, which constitute the master data entities.
  2. Classifying and standardizing common attributes, which are managed by individual silo systems and may bear different meanings at the onset but represent same element.
  3. Identifying the reference data sets (both internal and external) and mapping them to comprehensible list of enterprise wide accessible codes.
  4. Isolating the known inaccuracies in the data, especially the ones that are causing costly damage.
  5. Measuring the quality of the data by using data profiling and discovery tools.
  6. Recognizing and defining complex relationships between key master data entities.
  7. Determining the fetch mechanism to gather initial and delta changes from variety of data sources.
  8. Coming up with a strategy to extract data into master data hub using an iterative, repeatable process.
  9. Creating a framework for fixing duplicated data, error processing and exception handling.

To be able to do all of these activities, we need to have a good understanding of sources of master data. Section below, lists key questions that need to be asked and answered towards addressing aforesaid aspects.

  1. What are the key entities that the source system manage and require for smooth functioning?
  2. What are the essential elements that constitute each of the entities listed above?
  3. What is the business name of a given data element? What does the element mean and what does it represent?
  4. What is the format and length of each of the data elements? Are rules and logic defined to ensure the values adhere to the definition of the field.
  5. Is data element a mandatory field? Conditionally required? A derived value? Has a default value?
  6. If an element is referenced by a code lookup, what are the possible values?
  7. In case of reference codes, is there a central reference data repository that can manage this data?
  8. Are there any requirements for logical grouping of the data elements?
  9. How are different entities linked with each other? What is the cardinality of the relationship?
  10. How are these master data elements created, read, updated, searched and deleted?
  11. What are the ways in which data can be extracted from the source? (Replication based, Time Stamp based, Table comparison, Log based etc.)
  12. What is the frequency in which data can be synchronized to master data hub? Can system support real time invocation of services?
  13. What is the estimated number of records for initial and delta load? What is the rate of increase year-on-year?

While this list is not exhaustive, it sure gives us a starting point on the right questions to ask during this commonly occurring practice during MDM endeavor.

What do you think? Please share your thoughts via comments.

Image courtesy of twobee/FreeDigitalPhotos.net