Data Management

12:30 PM
Jonas Olsson
Jonas Olsson
Commentary
50%
50%

Data Quality: The Misguided Quest for the Truth

There are a number of reasons for the high failure rate in data warehouse projects, but I believe that one of the primary culprits is the misguided quest for the truth.

The truth -- it’s what everyone wants. Give it to me straight. I can handle it. The truth is a noble pursuit -- the quest for the God Particle or the meaning of life. The quest for the truth in science, philosophy, and literature is one thing that separates man from animal. However, in the data management world, this same quest is what separates success from failure, but not how you might think.

It is widely accepted that between 50% and 80% of all data warehouse projects fail. As someone whose company provides a data warehouse for the financial industry, this is an embarrassing problem that we, as data professionals, have spent too little time trying to solve.

There are a number of reasons for the high failure rate in data warehouse projects, but I believe that one of the primary culprits is the misguided quest for the truth. That is, there is a confusion that data quality, which we can all agree is of the utmost importance, is based on its truth. Equating quality with truth causes us to commit inordinate amounts of time and resources for a futile quest, for truth is based on context, and context changes from user to user.

However, if we instead realize that the truth can be determined only by context, then we are free to change the equation from “quality equals truth,” to “quality equals factually correct in accordance with the definitions of data.”

With this radically simplified approach to quality, we can make the whole QA process much more efficient by dividing it into two steps, one prior to loading the data and the other when reading (interpreting) the data. This will allow us to verify that the data is factually correct (prior to loading) and at the same time support multiple definitions or interpretations of the data when reading it. Supporting multiple definitions of the same concept is something that is getting increasingly important as most industries get more and more complicated, something that is reflected in the data an organization works with.

For example, a bank’s risk department may define market value differently from the way the wealth management department does. Each has its own understanding of market value, but each has a different definition, or context, for how to get that number. As such, one may include accrued interest for fixed income assets, while the other does not. Both are true, but nonetheless, we waste months or even years building data warehouses that will offer just one of these departments -- or perhaps neither of them -- the truth as they define it.

Instead of determining a single overriding truth, we need to evolve with the business and design systems that let users find their own truth as defined by their context. So if the quest for a single version of truth should really be context-focused, how do you make this a reality in today’s data warehouse? Well, as with most problems that are disassembled and analyzed, there is a very logical process to take to capture this more realistic understanding of quality.

Specifically, companies need to get out of the old data warehouse extract/transform/load protocol, and move to ELT, in which transformation -- and context -- follow loading. ELT has a big-data-like feel to it, in that data professionals can more quickly and easily bring in data, allowing each individual department to interpret the data in a way that they feel adds the maximum value to their business. Moving the transformation layer very effectively addresses the challenge of multiple versions of the truth, because it allows strategy to take the lead position over a data management tactic, and strategy always beats tactics.

ELT is also much better for today’s dynamic data environments and makes it easier to store data from multiple sources, to support multiple definitions of the truth, and to support multiple interpretations of the same fact. For example, you can have multiple definitions of market value, one based simply on underlying securities and another including such arcane details as unsettled accrued interest, restitutions, and so on.

There is much to be excited about in data management, with an incredible amount of innovation around the idea of getting more out of data. But, when creating a forest, it can be distracting to focus on the trees. With that in mind, focusing too much on the data can endanger the whole data warehouse project.

Jonas Olsson is the CEO and founder of Graz , a provider of data warehouse and business intelligence software built specifically for the needs of investment managers, insurers and banks worldwide.  Olsson founded Graz in 2000 as an IT services firm, and transitioned it ... View Full Bio
Comment  | 
Print  | 
More Insights
More Commentary
Shared Reporting Services on the Horizon, Genpact Predicts
The financial services industry is starting to adopt shared services, resulting in reasonable impacts to the bottom line. Genpact expects a push for reporting efficiency will come next.
Don't Let the Cloud Rain on Your Operations Strategy Parade
Avoid migrating large applications all at once to minimize risk during a cloud project.
Could Intel Lose Data Center Market Share to ARM Chips?
ARM chips could be an alternative for certain purposes in the datacenter, but many questions have to be answered before they pose a threat to Intel's market dominance.
Cost to Trade: Hey, Banks, Itís Time to Face the Music
Why is calculating the cost to trade so difficult for banks? The answer is as complex as the calculations themselves.
M&A Activity Will Continue to Grow in 2015
Data shows that the M&A market continues to improve, and forecasts indicate deal making will be healthy in 2015.
Register for Wall Street & Technology Newsletters
White Papers
Current Issue
Wall Street & Technology - Elite 8, October 2014
The in-depth profiles of this year's Elite 8 honorees focus on leadership, talent recruitment, big data, analytics, mobile, and more.
Video
5 Things to Look For Before Accepting Terms & Conditions
5 Things to Look For Before Accepting Terms & Conditions
Is your corporate data at risk? Before uploading sensitive information to cloud services be sure to review these terms.