Data-Scraping Evilness
by Mez on Nov.05, 2008, under Geeky
So, I’m sure most of you are aware of Facebook – if not, where have you been?
Anyway, I run a site that promotes local events and gigs for a specific group of people, and a lot of the work for the site is in keeping it up to date – going and grabbing the info from various different websites, and plonking it into the format that’s used by the website. (which could consume hours)
This morning I noticed that I kept getting invites to events on Facebook that I should be adding to the website. I also noticed that the emails I got from Facebook were all in the same form.
Regex anyone?
preg_match("/^Event: (.*)\n.*\"(.*)\"\nWhat: (.*)\nHost: (.*)\nStart Time: (.*)\nEnd Time: (.*)\nWhere: (.*)\n\nTo see more details and RSVP, follow the link below:\n(.*)\n/m", $email, $matches);
So, yup, that’s what I did – I poked it all through a script, registered a new Facebook account, and now – through the magic of Regular Expressions, when someone invites the special user to an event, it automatically gets added to the site (through a bit of PHP + procmail magic! (with sanity checks!))
It was certainly interesting to get going, and well, quite fun… but I feel a bit dirty doing it
Anyway, now all I need to do is create screen scrapers for the different websites that I get the gig listings from, and well, hopefully, then, I’ll be able to have everything automated!
If only…

November 6th, 2008 on 12:18 am
See recent article on this:
http://ask.slashdot.org/article.pl?sid=08/10/27/1245219
and a reply that is very relevant:
http://ask.slashdot.org/comments.pl?sid=1008923&cid=25527129
November 6th, 2008 on 12:26 am
Dude, this is one of the things rss was invented for. The different website should be convinced to provide the data as a feed.
November 6th, 2008 on 1:14 am
Is it considered scraping if you are using an email sent to yourself? You are given implicit license to use that email for whatever personal purposes you might have. If you are also a person who happens to run a site that displays events and you are getting event email from another source, then by all means use it.
It isn’t like you are actively downloading all event pages on Facebook and scraping them; facebook actually SENT you the data.
+1 from me!
The more of Facebook’s walled garden data that gets out the better.
November 6th, 2008 on 2:36 pm
Facebook has an iCal export of events you are invited to. Click on “Events” under the “Applications” listing, and then on “Export events” to get the URL for the iCal feed.
And encourage other sites to support iCal…