GLOSSARY

Data Quality

Data quality is the degree a dataset fits its intended use — accuracy, completeness, consistency, timeliness, validity, uniqueness — enforced by testing.

Last updated:

Quick answer
Data quality is how well data fits the purpose it is used for, measured across standard dimensions: accuracy, completeness, consistency, timeliness, uniqueness, and validity. Modern programs instrument rule-based checks and statistical tests in the pipeline itself — using Great Expectations, Soda, Monte Carlo, or dbt tests — and track incident mean-time-to-detect and mean-time-to-resolve per dataset.

WHAT IT IS

The DAMA-DMBOK and DQ dimensions model (Wang & Strong, MIT) define the canonical dimensions. Modern practice encodes them as assertions (expectations, tests) that run on every pipeline — tools like Great Expectations, dbt tests, Monte Carlo, Soda, and Anomalo flag failures before they reach dashboards. Observability platforms add machine-learned anomaly detection for things assertions can't specify.

HOW IT WORKS

A data quality program defines SLAs for critical datasets, owners for each dataset, a remediation workflow, and a visible scorecard. Alerts go to the team that can fix the problem, not to a mailing list nobody reads.

WHEN TO USE

Invest in data quality the first time a leadership number is wrong, when ML models degrade silently, or when regulators require attestable data lineage and accuracy evidence.

RELATED

SOURCES

Related questions.

What is data quality?
Data quality is how well data fits the purpose it is used for. The standard dimensions are accuracy (right value), completeness (no missing values), consistency (same value across systems), timeliness (current enough), uniqueness (no duplicates), and validity (conforms to business rules). Fitness is purpose-specific, not absolute.
How is data quality measured?
By running rule-based checks and statistical tests against the dataset on a schedule, reporting pass/fail per rule, and tracking quality scores at the dataset and column level over time. Modern tooling (Great Expectations, Soda, Monte Carlo, dbt tests) codifies these checks in the pipeline itself.
Who is accountable for data quality?
The domain business owner of the data, not IT. IT can provide the tools to detect issues, but the business is the only party that knows whether an anomaly represents a bug, a pipeline break, or a genuine change in the world. Quality dies in organizations that delegate ownership to IT alone.
What is the cost of poor data quality?
Gartner's long-running estimate puts the average annual cost at roughly $12.9M per organization in direct losses, before counting eroded trust in analytics and abandoned initiatives. The compound cost of lost decisions typically exceeds the direct cost by a wide margin.
How does NUUN Digital remediate data quality?
We instrument the top-priority pipelines first, wire alerts into the team that can fix them, and measure incident mean-time-to-detect and mean-time-to-resolve. We do not pursue 100% quality — we pursue enough quality for the decisions the data supports.

Need this term in action?