It never ceases to amaze me how frantically people rush to shoehorn their products, companies or internal IT strategies into the latest buzzword. Every large banking, investment and insurance company claims to have a big data strategy. The big joke is that these strategies are often very small.
Let me get right to the point: big data is way, way more than storing, organizing and crunching all the log files you used to throw away. Defining big data as what can be done with Hadoop or Splunk amounts to a hijacking of the term. I can't blame vendors for wanting to be synonymous with the next big thing. But it's a joke.
Sure, log files are part of big data, which is generally agreed to be all data too large and/or complex to be handled by traditional databases, but for which there are software tools emerging that can actually tame these beastly information stores and extract value (paraphrasing Wikipedia's Big Data page here). But log files are not only just a subset of big data, they're not even the most valuable part!
Ask any marketer if clickstream data is important to understanding their customers better and they will say "yes." Ask them if the ability to integrate, correlate and analyze all customer communications including social media, email, online customer chat, call center notes and open-ended survey questions and they will say "heck, yes!" You see, log files help answer only the "what" questions (what are customers doing on our web site), but don't touch the "why," which lives in the unstructured content.
Darin Stewart, a Research Director at Gartner, has just published a report titled "Big Content: The Unstructured Side of Big Data." In it he has noted that, "Unstructured content represents as much as 80% of an organization's total information assets." And the report goes on to state that "The true potential of big data is only realized when the source information pool is a hybrid of structured and unstructured information."
Unstructured content has long been an unsolved problem, granted. But technologies are emerging that make that goldmine of insight accessible to analytical frameworks. Big data is essentially ALL Data. Pinning your career on a big data strategy that deals just with log files is risky and virtually guarantees you will be delivering legacy on arrival.
And that's No Joke.
About The Author:Julio Gómez is General Manager, Financial Services, for Attivio, a Boston-area software company whose core product, Active Intelligence Engine™ (AIE), is a unified information access (UIA) platform. AIE integrates all types of data and content in a universal index to deliver complete information enriched with sophisticated analytics.