In my previous post, I gave an overview of Reference Data and discussed some of the key features of a Reference Data Management (RDM) System. This is second post in the series and here we will look into some of the challenges organization face dealing with Reference Data.

Multiple codes and their mapping:
A simple example of reference data is country codes. Wikipedia tells us that there are several different systems being developed to represent countries and dependent areas. The best known method being ISO 3166-1 for country codes has three sets of code representations. For instance, depending on whether you are using Alpha-2, Alpha-3 or Numeric encoding method, United States of America may be represented as US, USA or 840 respectively.

I talked about NAICS codes in my previous post. This one is another complex problem as these codes were earlier represented as Standard Industrial Classification (SIC) codes. Although NAICS replaced SIC format in 1997 some of the regulatory reports still need SIC codes. European community uses NACE classification which is functionally similar to NAICS but follows different coding. If you are a global organization doing business in different countries, you have to map these codes correctly to be compliant in your reporting.

Data Integration Challenges:
Typically, reference data is represented by different codes across your enterprise applications. This happens when application owners choose codes and description which best suit them. This specific aspect often bites us during data integration projects because of the effort involved in mapping and reconciliation between different data sources.

We often face reference data related challenges during MDM implementations. For example, an online bill payment service may be represented in billing system as OBS where as in customer information file it may be called as BILLPAY. We have to be able to map both OBS in billing system with BILLPAY service in customer information file to one single service type in MDM. This mapping becomes very critical when you are designing MDM to create single view of customers and their contracts.

Data integration projects often have to deal with thousands of such code value pairs. To ensure ETL teams transform data appropriately, we have to make sure code values of source are correctly mapped to codes of target. It’s important to note here that one of the main reasons for data integration project failure is the bad reference data mapping.

Inconsistent enterprise wide representation:
Most of the organization will not have a consistent representation of reference data at an enterprise level. As we discussed earlier, this data is often application specific and leads to data management in a silo fashion. Whether it’s an internal or external reference code, every data has a tendency to change over a period of time. When these codes change, not being able to maintain this reference data centrally can cause significant overhead for the enterprise both in terms of effort and dollars.

Added to this are the mergers and acquisitions, rapid growth in volume, complexity of reference data, and lack of governance and absence of enterprise wide single view. These aspects related to reference data cause major operational risk.

In my next post I will discuss how reference data management system can help resolve these challenges and help you treat your reference data as an enterprise asset.

Hope you liked this post. Please provide your views on this topic via comments.