Source Guru

Tag: pdi

Everything but the Kitchen Sink

by Mez on Dec.07, 2009, under Personal

Transforming data is hard. When I joined my current company, there were stupendous amounts of Perl/PHP/Bash/<insert random programming language here> scripts that would run on a cron job and do magic things to our data. They’d create reports, they’d tell the purchasers when we were running out of stock, they’d synchronise data between our Frontend and backend databases, they’d collect, they’d collate, they’d do everything and anything.

Except, with all these scripts, in all these random languages, written by a multitude of previous developers (at different skill levels), they weren’t particularly maintainable (and sometimes, they weren’t particularly readable or understandable either – imagine a 6000 line perl script that pretty much ran different permutations of the same data over and over again)

Enter Pentaho, and specifically it’s “Kettle” project. (since renamed “Pentaho Data Integration”), a tool that lets you manipulate your data in pretty much any way you can imagine, in the simplest and easiest way imaginable.

That’s right, it’s a GUI for data manipulation.

I know a lot of you are probably sceptical right now.  The first time I ever saw this was when a previous boss of mine put it forward as a potential solution for one of our problems (getting our orders from the front end database down to the office/warehouse).  I saw it, and I thought “GUI? Nah, that’s not how real programmers do things!”, so after the development team put forward another proposal to solve this, and it got accepted, I thought I’d never see the thing again.

That was until my current boss started playing with it, trying to work out what it was doing so that he could get these evil GUI based scripts into something manageable, like nice, pretty code.  Thing s, when my boss plays with things that he doesn’t know about, he tends to read up, research, and, 9 times out of 10, change his mind.

We wiped the previous server (it was rather noisy! We’re glad it’s no longer switched on!) and set up a new server to house our “BI platform”.  Starting off with a few scripts, my boss learnt to love this tool, and then, as I’m his “2nd in command” (aka general lackey) – started making me learn how to use it.

Again, I was sceptical, I didn’t want to learn, and I put up resistance, but my boss was going away for nearly a month, and by this time, a few of our key business processes relied on Kettle, so, grudgingly, I sat down, and started to learn.

You may be wondering now, why I started off this story talking about all those magical and wonderful scripts that no one seemed to know the inner workings of.  These scripts, as I’ve already mentioned were unwieldy, and at times, god-damned awful.  The plan was to move them to the BI system (as my boss had been doing already).

I like to think of Kettle as a bridge between the process-flow diagram, and the code.  I started converting these scripts, and I was astonished by the fact that most of the conversions I was doing was converting a long perl script into 3 or 4 “Integration steps”

I’m totally besotted with this program now.  Any time I have to do data manipulation, I turn to it.  I can’t describe how (once you’ve got used to it’s quirks) easy it is to use, how simple it is, and how much it just makes sense. Best of all, most of those evil scripts are gone now, and replaced with “pretty” diagrams that do the work for you.

If you have to play with large data sets on a regular basis, I urge you to try it out.  You can buy me a beer for reccommending it next time you see me at $conference.

9 Comments :, , , more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!