Tuesday, July 24, 2012

Caught between tomorrow's onslaught and today's data requirements?

The press is alive with stories predicting a huge wave of data coming. Some say within a short period of time, it's going to dwarf all the digital data created and stored to date. Wow. Like any other hot topic, the speculation is quite wild, yet, somewhere in there is a truth: the realm of digital data is growing and there's no foreseeable end to it. As a government focused guy, I know this phenomenon is top of mind with folks on all levels of government.

Now, let's swap gears for a moment. Last week, I had lunch with Adelaide O'Brien from IDC to catch up with my new gig at Cleversafe and she brought up the subject of records management within the context of the impending Data Tsunami. At first, I thought, Records Management? That's a stale conversation, all that work has already been done. Silly me. I should know better! Adelaide gave me a number of things to think about that are anything BUT stale, yet, not necessarily weighing heavily on many government people's minds (or, at the very least, not getting any press). Governance, FOIA & eDiscovery are the more compelling topics. As we talked, I began to realize there may be somewhat of a disconnect between the senior IT leaders and those a little father down in the organization. (Bear in mind, these are not the words of Adelaide or representative of her position, they are my impressions only.) The gist is this; the senior folks seem confident that their agencies are successfully handling records management, FOIA requests and are well positioned for eDiscovery, yet, the folks doing the work are not as confident, citing many examples of records being kept too long, the need to break FOIA responses down to paper, etc.

The conversation left me wondering how government can be preparing for the next wave of data without continuing to address it's current state.

I would love to hear from others who can continue to help me shape my thoughts on this issue. Am I on the right track? Am I missing the point altogether?


Wednesday, July 11, 2012

Whatcha gonna do with all that data?

Ah, big data. Such a fun topic of discussion. Central to many conversations is Hadoop, and for good reason. You would be hard pressed to find a more suitable framework for analyzing large data sets. The simple idea of taking compute to the data rather than bringing data to the compute resources has changed the game.

So, of course, the promise of a tool such as Hadoop would lead one to believe that an organization should be able to run an analysis on any data set, regardless of scale. However, (don't you hate when someone writes 'However'? You know something is coming that you don't want to hear!) there are some challenges to overcome to allow Hadoop to live up to it's full potential.

As many of you most likely know, Hadoop leverages a file system approach (HDFS) to storing data to be analyzed. And, like most file system storage approaches, there are limitations. Doesn't it strike you as odd that a framework designed to analyze HUGE amounts of data is dependent on a file system that runs out of gas long before the framework does?? Well, it did strike some folks at Cleversafe and Lockheed Martin as odd.

To that end, we've announced a new offering that will address this limitation by swapping out HDFS with Cleversafe's object storage solution. Based on Information Dispersal, this offering will allow organizations to enjoy all the scalability, reliability and efficiencies of Cleversafe's technology within their Hadoop environments. 

To learn more about this, I encourage you read Bob Gourley's impressions and the article that ran in GCN on July 10th.

I'd love to hear your thoughts...