Recently Henrik Liliendahl Sørensen (@hlsdk) wrote a blog post where he discusses the data matching challenges involved while dealing with small scale business owners.

Unlike individual customers and business customers, these small scale business owners fall into an intermediate category causing a lot of confusion in our data matching rules. 

We compare records which are of same type to get accurate matching results. For example- compare person-person, organization-organization, address-address record etc. This will not only simplify the matching process, but also help in identifying duplicates easily. Failure to identify the type of record or placing a record in wrong category can cause discrepancies in the data matching process.

Imagine this scenario when the hub receives a person record from a channel which only has the last name field populated. Apparently, the front desk staff did not have good way to capture organization customer details and populated last name with organization name because it was mandatory (Believe me on this, it happens very often). And if the matching is configured to match only Person-Person & Organization-Organization records, we sure have a problem.

So what we need to do to address this kind of issues?

One would argue that our matching rules need to be more efficient. They should be able to detect what type of record it is before applying fuzzy matching logic on different critical data elements. The fundamental problem is see here is that we are trying to compare one type of data (Apples) with some other type (Oranges).


Categorizing similar records is as important as standardization and consistent representation of data.

My sure-fire answer to this issue is to do upfront, proactive data quality management. We can add simple validations like – making a mandatory first name and last name for a person; capturing legal name, type of business for an organization record etc. I know sometimes the rules are hard to implement at the customer facing applications, but the data quality control mechanisms built into the solution should take care of these transformations.

Categorizing similar records is as important as standardization and consistent representation of data. Upfront data quality management in your solution should handle this thus boosting the matching process and help improve duplicate consolidation percentage.

More than everything, the staff handling data needs to be educated. You wouldn’t want the loop holes in the applications feeding data to allow wrong classification of master data records. Even if they do, the data quality control workflow should address this.