Saturday, October 16, 2010

Apache Tika

I was amazed with the kind of extraction facility has been made available in open source world. Apache Tika has has gone a long way to help developers come with better crawlers. I was happy to use it in a project

HADOOP+PIG+HIVE

Its been fun working on humongous data with an abstraction created by the realm of different projects under hadoop. PIG, HIVE being the query language which provides even better idiots layers on top it. Aspects of keeping a cluster running, with managing data sizes with various compression initially driven by LZO (elephant-bird) drifting towards block compression. Making ones life easier by using scheduler like Azkaban. Humongous data and heavy loading, thats the way to go

Monday, October 04, 2010

Murder + Capistrano

Deployments are one painful task in any production environment and it shows its ugly face in a resource heavy environment. PUPPET for OS centric deb/config could be useful. Recently got a chance to use a combo of capistrano and murder. Murder is a beautiful concept of torrent inspired production deploy and capistrano brings in the feature of other abstracted functionalities.

In the end I wonder why so many deployment tools is in RUBY or support a tool written in ruby by providing a gem of itself

Saturday, October 02, 2010

Ant , Maven, Ivy

Moving from the trustworthy ant to abstracted maven. The only thing I realised, unless applied company wide maven becomes hard although its usp is well established. Time to dibble with ivy for it promises a lot with transitive dependencies.

One interesting realisation in maven that one can fake the repo if you have a local set of jars in your machine (for the jars not present in repo)

<dependency>
      <groupid>...</groupid>
      <artifactid>....</artifactid>
      <version>...</version>
      <scope>system</scope>
      <systempath>/absolute path to lib/XXX.jar</systempath>
</dependency>

And I am inclining more and more towards a combo of ant + ivy. In my view and use case and even generally it seems most promising.

JAVACC

http://en.wikipedia.org/wiki/JavaCC

As it turns out, I finally get to use it in a system. Not bad at all