Source Guru

Data-Scraping Evilness

by Mez on Nov.05, 2008, under Geeky

So, I’m sure most of you are aware of Facebook – if not, where have you been?

Anyway, I run a site that promotes local events and gigs for a specific group of people, and a lot of the work for the site is in keeping it up to date – going and grabbing the info from various different websites, and plonking it into the format that’s used by the website. (which could consume hours)

This morning I noticed that I kept getting invites to events on Facebook that I should be adding to the website. I also noticed that the emails I got from Facebook were all in the same form.

Regex anyone?

preg_match("/^Event: (.*)\n.*\"(.*)\"\nWhat: (.*)\nHost: (.*)\nStart Time: (.*)\nEnd Time: (.*)\nWhere: (.*)\n\nTo see more details and RSVP, follow the link below:\n(.*)\n/m", $email, $matches);

So, yup, that’s what I did – I poked it all through a script, registered a new Facebook account, and now – through the magic of Regular Expressions, when someone invites the special user to an event, it automatically gets added to the site (through a bit of PHP + procmail magic! (with sanity checks!))

It was certainly interesting to get going, and well, quite fun… but I feel a bit dirty doing it ;)

Anyway, now all I need to do is create screen scrapers for the different websites that I get the gig listings from, and well, hopefully, then, I’ll be able to have everything automated!

If only…


4 Comments for this entry

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!