Bringing a proof-of-concept project into production is only the beginning. Postproduction, Hadoop differs greatly from other information technologies. Deploy SAP or Salesforce, for example, and the transition typically means a shift into a lower-intensity "maintenance" mode, where less attention and fewer resources are required. With Hadoop, in contrast, delivery of the first production application is just the start of the journey. Trust me: Pressure will soon mount to develop new applications. And these new applications will require integration with new data sources. Your users will want to run more and more exploratory jobs.

In companies experiencing this kind of "success disaster" with Hadoop, keeping up with demand for expansion and new use cases often requires more effort than getting the initial application into production.

While there are many areas that IT managers must address to ensure the ongoing success of a Hadoop initiative, here are five challenges you should proactively address:

1. Keeping your software up to date:

Hadoop is a rapidly evolving framework. Unfortunately, updating Hadoop software is challenging, especially on heavily used clusters. As a result, many people get stuck on a 3-year-old version and, before you know it, it's a huge effort to even think about upgrading. Although challenging, it's worth instituting a program of regular, incremental updates to the Hadoop software stack. To facilitate these updates, establish a frequent maintenance window for the cluster. Yes, the concept of a maintenance window feels retrograde to many IT organizations, but it's preferable to falling behind the fast-moving Hadoop ecosystem.

2. Scaling your cluster:

Going from a half-rack to a full one brings one set of challenges; expanding from one rack to two brings different trials; going from two racks to four ... you get the idea. Each time you grow your cluster, there are new issues. Fortunately, Hadoop scales relatively easily, and it comes with built-in tools for common tasks like rebalancing disks. Still, the logistics of expanding the physical infrastructure can be thorny because, as your cluster grows, new tuning settings are required, and problems that didn't used to happen very often start to occur regularly (like failed disks). Critical Hadoop software services, such as your Name Node and Resource Manager, may need to be improved as well. Unfortunately, there's no silver bullet for addressing these problems. The best approach is to get ahead of the curve -- plan for expansion well before it becomes critical. One way to achieve this is to add a bit of capacity every quarter or even every month, on a regularly scheduled program.


... Read full story on InformationWeek


Post a comment to the original version of this story on InformationWeek