Bytes2Bites: May 2012

Thursday, May 31, 2012

Rehashing a point of view

Being the 'new guy on the block' with regards to the storage industry, I've given myself the license to wonder about everything, open old wounds and simply ponder the question, 'why'. This is not a new place for me, I've been curious about how everything works and why things are as they are ever since I was a little kid burning holes in my Mom's carpet. Um, little kids don't know the difference between AC and DC power. So, in my mind, a battery operated electric motor would run REAL fast if you hooked the wires to an extension cord. Oops! To this day, I don't know how I survived my curiosity as a child. But, I digress.

Most recently, I've been chewing on the notion of structured vs. unstructured data. For years, I've had a notion of what I thought constituted a structured data and all else was, by definition, unstructured. Right? Admittedly, my parameters were pretty simple. Any data stored in a database was considered structured making any data stored in some other format (can you say files?) unstructured.

But, is this really an accurate way to think of information? I figured I'm not smart enough to be the first person to ever consider this so I hit Google. Not surprisingly, I found a number of relevant hits, from blog entries to academic papers, on the subject. Really? Academic papers? Ok.

Anyway, after reading and digesting I've come to the conclusion that characterizing data as structured or unstructured is more relevant to the context of who or what is attempting to use it. For example, information stored in a database is most certainly considered structured to another computer application, yet, showing raw data in it's table format to a person, especially a non-technologist, would most likely prove to be confusing. On the other hand, a human being sitting down to read the most current corporate memo could easily argue that the information is HIGHLY structured, yet, it may not necessarily be as apparent to a computer.

So what's my point to all this. Simple. In general, 'structured vs. unstructured' is a false comparison unless it is applied to a specific point of view. What most people REALLY mean when they say 'structured vs. unstructured' is 'information stored in a database vs. information stored in a file'.

So back to being the 'new guy on the block', this seems to have some interesting implications with regards to discussing storage solutions. When a vendor says they are good with unstructured data, do they really mean information stored in files? I suspect I know the answer.

Any thoughts out there?

Friday, May 18, 2012

Garbage In - Garbage Out

Over the past few weeks, I've been engaged in a search for a new home. Part and parcel to this activity is, of course, interfacing with the financial services industry. Oh joy! Now, those of you who work in the industry, please bear with me. I know the vast majority of individuals who dedicate their time to helping the rest of us with our financial needs are competent, friendly and really do care, so, I'm not writing this post to in anyway insult the people of the industry.That said, I'd like to use the systems of the FSI industry, in particular, that of the credit scoring agencies, to make a point, so hang with me.

History lesson: Way back when in 1961, my parents opted to name me after my dad, Robert Lee Caudill, Jr. Nice sounding name, right? I've always been proud to use it, however, like many people's names, over the years, I ended up with a number of different versions. My friends all call me "Bobby Caudill", at work, I've been "Robert Caudill", "Robert L. Caudill", "Robert Lee Caudill", "Robert Caudill, Jr." and any other twist you can imagine. And, as luck would have it, my dad's name has been similarly altered.

Why am I going on about this? Enter the credit reporting agencies. Can you imagine the havoc this situation has had on my credit report? (Yes, my dad's report is all screwed up too!) For years, we've been trying to unwind our credit profiles and after at least 25 years of trying, we are not any closer to getting it done once and for all. Every time either of us makes a life change, we both end up disputing entries to our credit report. Sad as it is, we've long since accepted this as an unfortunate reality we must deal with in this age of technology.

So, data. What an interesting thing it is. Practically every thing we do today is influenced or even controlled by data that's been collected, mined, sliced, diced, analysed to death. When the data in question is guaranteed accurate and authentic AND the questions being asked of it are appropriate and well thought out, the results can be quite useful. But what happens when the data is assumed accurate and/or the questions being asked are simply not appropriate? Well, as I can attest to based on my current credit score, anything can happen.

As the world continues down the path of storing and analyzing every single bit (yes, I mean 'bit' in the context of a 'byte') of digitally generated 'information', I often wonder just how often people (and systems) come to the wrong conclusion because the data set has incorrect information.

Working for Cleversafe, I talk a great deal about how we help customers indefinitely store huge volumes of data, keeping it reliably accessible, safe and secure for all their future processing and analytic needs. With my focus being on government customers, it is quite rewarding knowing that I am helping to preserve our nation's information through to the end of the republic. Yet, sometimes I wonder, how much of the data being stored and used for big data projects is truly accurate?

Garbage in, Garbage out? Only time will tell.