image

Treat Your Data Like Caviar

In short, the CAViaR model ensures that data is Complete, Accurate, Valid and Restricted. That means you’ve done the due diligence and have the proper processes in place to insure not just the integrity of your data, but the security and usefulness as well. We’ll drive into all four qualities in short order.

Recently, I was served 11 Madison’s park take on eggs Benedict. A truly decadent dish involving quails egg, ham, asparagus, and caviar all served in an attractive Art Deco tin.

True to tradition, the utensil was an elegant Mother of Pearl spoon. The narrative behind the delicate spoon is that metal would damage the flavor of the premium caviar. But, when you think about it, this tradition doesn’t make the most sense. Surely, all the time the caviar spends in contact with the tin it was served in far outweighs the few seconds it would be in contact with my spoon.

The CAVIaR model for Data Governance is in many ways like the traditions around sturgeon roe. We want to protect and ensure the value of something precious and scarce, good data. I can’t take credit for developing the CAVIaR model, that credit goes to Anne Ahola Ward and her “The SEO Battlefield” Book. A quick google for the term doesn’t turn up much, which is a shame. Her model is quite clear and well-formed.

What is the CAVIaR model?

In short, the CAViaR model ensures that data is Complete, Accurate, Valid and Restricted. That means you’ve done the due diligence and have the proper processes in place to insure not just the integrity of your data, but the security and usefulness as well. We’ll dive into all four qualities in short order.

Complete

Is your data truly representative? Have you eliminated gaps? Did you work to remove selection bias from the collection process? Gaps are easy to find, Selection bias is much more insidious.

Accurate

Are you measuring what you say you’re measuring? Are you over scrubbing the data? What are the gaps in your measurement process that might miss or exaggerate data points? Are you properly scrubbing out noise or are you accidentally eliminating significant outliers?

Valid

Did you really measure what you say you measured? Or are you actually tracking a roughly correlated measure? Do you have too much noise?

Restricted

In the current age, this is the most overlooked requirement. Are you properly controlling who can view and see the data? Do you have appropriate encryption on the data at rest? Are you tracking access? How will you know if there’s a data breach? What data do you have, do you know where all the PII, PCI and HIPAA sensitive records are in your infrastructure?

All of these tests are a necessity. They should provide the foundation for any data operations you perform. In the modern enterprise, data is as precious as fine osetra caviar and should be given the same level of respect and tradition.