Part 2: Reference Data Management Challenges

Part 2: Reference Data Management Challenges

In my previous post, I gave an overview of Reference Data and discussed some of the key features of a Reference Data Management (RDM) System. This is second post in the series and here we will look into some of the challenges organization face dealing with Reference Data.

Multiple codes and their mapping:
A simple example of reference data is country codes. Wikipedia tells us that there are several different systems being developed to represent countries and dependent areas. The best known method being ISO 3166-1 for country codes has three sets of code representations. For instance, depending on whether you are using Alpha-2, Alpha-3 or Numeric encoding method, United States of America may be represented as US, USA or 840 respectively.

I talked about NAICS codes in my previous post. This one is another complex problem as these codes were earlier represented as Standard Industrial Classification (SIC) codes. Although NAICS replaced SIC format in 1997 some of the regulatory reports still need SIC codes. European community uses NACE classification which is functionally similar to NAICS but follows different coding. If you are a global organization doing business in different countries, you have to map these codes correctly to be compliant in your reporting.

Data Integration Challenges:
Typically, reference data is represented by different codes across your enterprise applications. This happens when application owners choose codes and description which best suit them. This specific aspect often bites us during data integration projects because of the effort involved in mapping and reconciliation between different data sources.

We often face reference data related challenges during MDM implementations. For example, an online bill payment service may be represented in billing system as OBS where as in customer information file it may be called as BILLPAY. We have to be able to map both OBS in billing system with BILLPAY service in customer information file to one single service type in MDM. This mapping becomes very critical when you are designing MDM to create single view of customers and their contracts.

Data integration projects often have to deal with thousands of such code value pairs. To ensure ETL teams transform data appropriately, we have to make sure code values of source are correctly mapped to codes of target. It’s important to note here that one of the main reasons for data integration project failure is the bad reference data mapping.

Inconsistent enterprise wide representation:
Most of the organization will not have a consistent representation of reference data at an enterprise level. As we discussed earlier, this data is often application specific and leads to data management in a silo fashion. Whether it’s an internal or external reference code, every data has a tendency to change over a period of time. When these codes change, not being able to maintain this reference data centrally can cause significant overhead for the enterprise both in terms of effort and dollars.

Added to this are the mergers and acquisitions, rapid growth in volume, complexity of reference data, and lack of governance and absence of enterprise wide single view. These aspects related to reference data cause major operational risk.

In my next post I will discuss how reference data management system can help resolve these challenges and help you treat your reference data as an enterprise asset.

Hope you liked this post. Please provide your views on this topic via comments.

COMMENTS

3 Thoughts on Part 2: Reference Data Management Challenges

David Jaques-Watson

3 Apr 2013

11:31pm

Some of those “simple” code tables, like country codes, are not so simple when you put them to use. They only make sense in a context (when they are playing a role), and the context is often different from the “base” tables.

For example, it is possible to have a “Country of Birth” view that allows a person with a DoB of 1954 to be born in the USSR – but that should not be possible if you are currently three years old. Similarly, the ISO 3166 code set includes Guernsey, Jersey, and the Isle of Man – but most organisations would say you can’t be born there, it’s the UK. These entries are only important to financial institutions – given they have their own currencies, are “in-but-not-in” the Eurozone, and are tax havens – so you may need a “Country of Currency” view. Similarly, if you are an organisation that processes refugees, you may need “Country of Origin” that shows Eritrea before, during and after its takeover and release by Ethiopia, and a “Country of Citizenship” that allows you to add “United Nations” recognised refugees (stateless people granted UN recognition & paperwork).

And probably remove entries such as Antarctica. I’ve seen a system that contained three people who were born there, and two who were citizens – neither of which is possible (it was a simple DQ error of encoding AQ instead of AU, but hey, if it can go wrong, it will).

Thomas Tong

9 May 2013

9:11am

Great comment David and excellent illustration. Over the last 20 years, I have seen RDM dramatically change and mature in context to the complexity of business. As we grow our business and personal footprints over time, we have more “references” to address. For me, there are different types of reference data, as context is king. There are “global references”, such as country codes… “industry references”… “domain references”…, etc. If you want to create something that is sustainable, such different types of references must be “cross-references” and often business rules must be applied to make such complex reference data have value for a local scenario/local business process.

The process of merging/harominzing reference data should be an ongoing activity…similar to MDM…little less pace…. as usually such merging/harmonization is needed during mergers/acquisitions or simply a large system update wherein you are combining smaller local systems into an enterprise platform.

si downes

18 Nov 2016

10:53pm

I am currently developing a logical data model to be implemented as a canonical message model in Mule.
In relation to reference data management, am interested in your take on data that has been normalised out in the LDM but is captured and stored in a flatter structure.
For example, Consents are captured as a set of radio buttons on a form, and stored as a set of attributes on the Applicant table in Salesforce.
The Logical Data Model has a Consent entity and Consent Type reference entity.
So even though it is not stored, I am going to manage these possible categories as reference data sets and create business keys, etc.
The same for, say, Address Types – an application form captures a number of addresses (correspondence, residential) and they are stored in flat form, but I will define a Reference Data set of ‘Address Type’, initial values being correspondence and residential.

I can’t see another way of managing this sufficiently but am interested in others experiences

MDM – A GEEK'S POINT OF VIEW