Knowledge post
2024-01-10

The challenges of Data Quality - How MDM helps with ERP modernizations

Good data quality is the foundation of successful MDM projects — here's how to achieve it.
Bosse Axhill
Business Analyst

We have previously talked about the importance of Managing changes in MDM projects. Now we will focus on another area that affects everything — Data Quality. Together with Peter Karlsson we explore why data quality usually is the biggest challenge, and how the right methods and tools can make it one of the project's biggest success factors.

Data quality is one of the most important things to grasp during business system changes and modernization projects. — Peter Karlsson again, as a supplier you entered with a pretty “blank paper”. Further on this topic, what was the first step you took?

– The first thing we wanted to do was to create our own picture of what challenges we have in front of us. It was actually very easy to get answers from the business — but harder to get the right and sufficiently detailed answers. At first glance, reality is rarely reflected. The answers are often that you are good at cleaning and clearing, that you are good at keeping your data organized and that it is accurate and good. And that, unfortunately, too often is not the whole truth.

How is that coming about?

– You have to keep in mind that these people actually come from a functioning business. Orders come in, orders are dispatched and customers receive their goods — and from that perspective, everything seems to be fine and right. What you don't see, however, is that along the way there are many individuals who fix and fix things manually. Actions that cost both time and money and, in addition, create dependence on the person and with it a vulnerability.

Okay, so the goal is to get past the initial notion that everything is right. How do you guys do that?

– We start by requesting a sub-set of data from a number of entities that are primary to that particular business. Customer, Product, Supplier etc.

Do you get a clear picture of these entities?

– No, unfortunately it is never that simple, in this particular case the sub-sets were extracted from ERP systems that have been in operation for almost 40 years. And it was not surprising that a customer card registered in the late 20th century differed significantly from a customer card registered this year.

– When we do the first analysis of this data, we use tools for Data Profiling, but without going into detail and studying exactly what is in each field. Instead, the goal in this step is to produce normal distribution curves for things like fill rate. For example, what percentage of the organization number field is filled in for customers and suppliers.

What are you using the results of that analysis for?

– The value of this is to be able to start cleaning early in the project, and the filling rate is a typically “low hanging fruit”. It is easy to analyze and also easy to hand over for enrichment while delving deeper into the analysis of the other data. In addition, the first analysis gives a clue both for us in the project, as well as for the client organization, about how the data quality in general actually stands. The result is a discussion and decision basis for which steps to take next and in what order they should be taken.

Is this done in the Data Profiling tool?

– No, you don't, you fix in the source system. Data profiling tools are primarily for identifying, not correcting. If you do it directly at the source, all subsequent work on modernization will be easier. It's a bit like the slightly ugly expression “shit-in-shit-out”. If the incorrect or missing data remains in the source system, it risks contaminating the whole throughout the modernization, because to handle it easily, you need to implement and strictly follow temporary and costly procedures. Better then to correct in the source so that you are sure that it is quality you are getting out.

Are there other gains with this first analysis?

– Yes, absolutely, it forces businesses to actually take a closer look at their data. It is examined more critically. Among other things, there are often clear reasons why some important data such as organization numbers and more have been missing for a long time, and yet are not detected. Reasons such as customers are not actually customers anymore, and should reasonably be disabled, or even cleaned away. This is also a very important part of maintaining high data quality.

How well did the quality level match your expectations?

– I must admit that I was surprised at how low the quality actually was. But it also depends on the definition of what quality is. After all, the business has worked, even before the modernization, and when you have people reading from, for example, delivery addresses, they have usually been able to deliver, even if there is other data besides the address itself, or even wrong and misleading data.

What are you referring to then, do you have an example?

– Yes, for example, these can be instructions on which door to deliver packages to, phone numbers to call for someone to open, special instructions for delivery on Mondays, others for Thursdays.

– None of this is data that has to do with an address bar, but in the absence of other places to store the information, it has ended up there anyway. And on the question of what is defined as good and bad, you can say that the amount of data is very good and informative -- which is good, but the structure of it makes it intractable -- which is bad. All such anomalies need to be identified and better ways to store and manage them need to be created. So “in the crow's song” we have here yet another potential for improvement identified already in the initial analysis of data quality.

When you use certain fields for several purposes as you describe above, and the reason is that there are not many value holders – do you not still have to correct in the target system and not in the source?

– Yes, it is, and it's not entirely black and white either. The starting point is to always correct the source, but where it is not possible or when you identify higher efficiency or better results to receive the incorrect data and correct it with the help of new tools, you do so of course. Preferably with as much automation as possible.

I suspected so, can you tell me more? Any good examples?

– Absolutely, we have a concrete example when it comes to the concept of “Country”. In this case, we found that most variations were recurrent and uniform. Depending on who registered, and what they had as their mother tongue, Greece could be listed as “Greece” by someone who has Swedish as their mother tongue, but as “Grekland” for those who speak Danish, or as in some cases only with the nationality designation “GR”. To this end, we created new quality rules that automatically identified the country in connection with onboarding and put the common name on the registration of country affiliation on the supplier record.

But doesn't that mean the problems will continue? The same people who registered with different languages and codes in the old systems will surely continue to do so in the new.

– True, but we will make sure that it resolves by itself. In the new MDM system, each country is a ready-made code. Instead of manually punching in the country name, the user will be forced to select the country from a completed list. This also gives the advantage of being able to manage the language designation of the country. A Danish user sees the option “Grekland,” and a Swede logging into the same provider sees “Greece” and a Briton sees “Greece,” but all three are simply variants of the exact same registration. In addition, it is possible to link synonyms and acronyms to each country. Thus, each integrating system receives exactly the country identification it needs, e.g. the nationality designation “GR”, the ISO code for Greece “GRC”, etc.

Do you do this using the tools available in the MDM suite?

– Yes, the data quality rules are created using the built-in quality engine in IDMC.

“Claire is the AI engine itself and is tied to most of the modules included in the MDM suite from Informatica. A valuable feature of data profiling is that everything that is put in there is input to Claire.”

– Peter Karlsson, Senior Solution Specialist

You said Country was often specified with recurring variants, which made it easy to automatically identify which country code to put, but how do you deal with the exceptions, misspellings, etc? Are they put aside for manual processing, or have you created automated support for such as well?

– Yes, actually we have. There are built-in supports in the system for validating variations on spelling where the system makes a qualified guess and gives back suggestions. This path creates, to some extent, manual labor, but with very much prepared. The manual effort is therefore reduced to almost only decision-making based on the fact that prepared and clear documentation has been served. We have used this technology in many more cases than for countries. Email addresses, for example, must follow a clear format with the character “@”, followed by domain and a period, etc. This is validated using data quality rules. In addition, we use mail servers, which in many cases can be used to get a response if the specified email address actually exists and if it is active. Even on street addresses and place names, it is possible to validate the correctness by including postcodes and with services such as “Data as a Service” automatically ensure the correct spelling based on matches with correct addresses.

– Precisely for the address fields, we have also built automatic separation into the washing process. The data model has one field for street name, another field for street number, a third for possible number of stairs, etc. In legacy systems, this is often specified as composed on a single line, and this separation is now unnecessary because quality controls are defined in such a way that the output becomes a complete set of correct data separated and stored per field.

Have you worked with Artificial Intelligence in data quality?

– Yes, in Informatica's world, their tool is called “Claire”. Claire is the AI engine itself and is tied to most of the modules included in the MDM suite from Informatica. A valuable feature of data profiling is that everything that is put in there is input to Claire. This data is then used by Claire to suggest sets of quality rules to use for the respective fields. The data that is entered is used by the AI engine to draw conclusions about what it is actually kind of data that comes in that particular field. This also facilitates data modeling work involving many hundreds of fields. Since we who model already at first sight with the help of Claire get input on what the field is reasonable for something, in addition, we get a ready-made set of rules to apply, the modernization work is greatly facilitated and streamlined.

How much impact does AI have in this work?

– It depends on how much you choose to use it, but in simple terms it will be like an extra colleague, or rather — a whole bunch of new colleagues. Partly in connection with the modernization project, but especially in the day-to-day management of MDM data, many time-consuming tasks will no longer have to be done because Claire does the work for you. It is therefore a question of saving time and resources with increased efficiency as a result.

The project has been going on for quite a few months, and you are still not live. The first questions about data quality were based on the data that was available at the time. That data may have largely changed today, how do you deal with it?

– We have solved this by constantly adjusting our data quality rules. We have continuously requested updates to the data and through it have been able to validate, further develop and implement customized rules based on the current reality.

Don't you get to do the same job over and over again?

– Well, in a way you can say that, but the repetitive is primarily done automatically. When we receive the new, updated data, it runs through our already implemented data profiling, and then also through the existing quality rules. It is with the output of these that we see what further actions and changes we need to make. So it is not really a question of redoing the same work, but rather, on the basis of what we have already done, getting help in finding further opportunities for improvement.

It sounds very much like you really manage to get value out of quality management even before it's finished?

– Yes, that's really the case, every piece of the puzzle that is added gives the potential to be value-creating in the laying of all the later pieces of the puzzle.

A recipe for success?

– Exactly!

MORE INSPIRATION

Explore more knowledge posts

Expect change – that's why adaptability is crucial in MDM projects

Changes are inevitable in MDM projects. Here's how to manage them effectively in MDM projects – without losing momentum or focus.
2025-10-16

The importance of the right stakeholders in MDM projects

Engaged stakeholders on the right level are crucial to a successful MDM project. Without the right anchoring, the risk of failure increases – especially when the entire organization is affected.
2025-10-16

Ready to take the next step together with your data?

We help you transform data into information and communication that truly makes a difference – for your workflows, decision-making and product offering.