Deceased People Have Long Names

Blog

Deceased People Have Long Names

Attributes are the lowest level of data that give us knowledge and insight about customer data. There are attributes of specified length and type, simple and complex fields, lookup tables (or code tables), single and multi value attributes (typically found in product information management systems) etc. Each of these attributes is chosen to serve certain predefined purpose. It’s interesting how using an attribute meant for one purpose when used for totally different purpose lead to bizarre situations.

While working with a client recently, we came across a curious scenario. Records consolidated from a new source of customer data started failing to load to MDM. We took some sample and started analyzing as to why this is happening. Much to our surprise the failures were due to length of last name field. The field had names which were anywhere from 30 to 50 character length while our system was designed to handle maximum of 30 characters (Not a limitation, by design). Closer look at the data showed the names had values such as – ‘Sebastian Deceased 2011/05/13 Fernando’ which clearly exceeded the allowed limit on the last name column.

So what made someone enter a name like that in the source system? It doesn’t take a genius to figure that the source system in question had no place to capture the status of customer. So they decided to capture this data in the name field as that was long enough to hold this information. And we have to give credit to them for capturing this information consistently for all the deceased customers as we found a bunch of records all following the similar pattern.

Two things went wrong in this scenario from MDM implementation perspective-

One, the source data was not profiled. Secondly, there was no pro-active data quality check done.

I have advocated about data profiling as an important step several times in my earlier blog posts. In this scenario, profiling the source system data would have helped us capture variations of the name attribute, patterns, length of the field and its conformity to name standards. In an ideal scenario, with the knowledge of data profiling, we would have come up with better transformation rules to handle this data in the ETL (Extract – Transform – Load) process. Specifically in this case, we would parse the name field to get key information about the customer entity such as- First Name, Last Name, Current Status (which is deceased) and the deceased date of the customer.

In the same project, we also found defaulted date of birth fields, wrong province state codes and misleading gender types (Let me not go in detail on this one). All these aspects fall into same complication as with the name field and needed to be analyzed and assessed for quality.

We often hit such bottlenecks during the implementation of MDM which are caused due to repurposed attributes and work around implemented to capture data. MDM implementations can get lot simpler if they are supported by a well established pro-active data quality step and backed by a data governance program. The glitches such as the one discussed here could have been easily handled if there was a good definition of what a deceased customer means for the organization and was uniformly represented across all the enterprise systems. But that’s not a slam dunk.

By-the-way why do we care about dead people? You ask. Well… That’s a good question!! They just taught us to profile our data you see.

COMMENTS

No Thoughts on Deceased People Have Long Names

Leave A Comment

RECENT POSTS

Businex-Blog

Composable Applications Explained: What They Are and Why They Matter

Composable applications are customized solutions created using modular services as the building blocks. Like how...

Businex-Blog

Is ChatGPT a Preview to the Future of Astounding AI Innovations?

By now, you’ve probably heard about ChatGPT. If you haven’t kept up all the latest...

Businex-Blog

How MDM Can Help Find Jobs, Provide Better Care, and Deliver Unique Shopping Experiences

Industrial data is doubling roughly every two years. In 2021, industries created, captured, copied, and...