What is Data Quality

Data Quality Definition

Data quality is a measure of how fit data is for the purpose it is being used for. Data is considered high quality when it is accurate, complete, consistent across systems, and up to date. When it falls short on any of those dimensions, the decisions and processes that rely on it become less reliable.

What makes data low quality?

Problems tend to come from the same places: data entered manually and inconsistently, systems that do not share a common format, records that are never updated after the initial entry, or merges between datasets that do not account for duplicates. A customer database where the same person appears three times under slightly different names, or a product catalogue where half the entries are missing a weight field, are both data quality problems, even if the underlying information is technically correct.

Why does it matter?

Poor data quality has a way of hiding until it causes something to go wrong. A marketing team sends a campaign to lapsed customers who actually purchased last week. A finance report double-counts revenue because two systems recorded the same transaction differently. A logistics operation ships to an outdated address. In each case the underlying issue is data that was not accurate, complete, or consistent enough to be trusted.

The further bad data travels through an organisation before anyone catches it, the more expensive it becomes to fix.

How is data quality maintained?

It is generally easier to prevent quality problems at the point of entry than to clean them up later. That means validation rules on forms and APIs, clear data ownership so someone is responsible for keeping records current, and automated checks that flag anomalies as data moves through data pipelines.

Data lineage supports quality work by making it possible to trace where a problem originated, rather than discovering a bad figure in a report with no way to find its source. Master Data Management (MDM) addresses one of the most common quality problems specifically: multiple conflicting versions of the same record across different systems. By establishing a single authoritative version, MDM removes a whole category of inconsistency that would otherwise have to be managed record by record.

Who is responsible for it?

Data quality is rarely owned by a single team. Data engineers build the checks and pipelines that catch problems early. Business teams are often closest to the data and best placed to spot when something does not look right. In organisations that take it seriously, a data governance function sets the standards and coordinates across both.