Mining emails

Journal entry
October 28, 2002

Just for kicks I created a tiny web-spider (less than 2K, 30 minutes coding time) in Python designed to get email addresses off websites. The script is small, the rules for recognizing website- and email-addresses are real simple, but still it is pretty effective.

Starting at my website and crawler no further than 1 link away - only following links to the homepages of websites, it runs for half a minute and returns 18 emails after crawling through 28 URLs. If I increase the distance it can crawl away from my website to 2, it crawls through 306 websites and returns 151 email addresses.

Sheesh, is it really that easy - no wonder I’m getting more and more spam. Although I am wondering, how come there is a market for selling email addresses when it is this easy to farm them.

Categories
Selling out
Did you know?
Jakob is an independent web application developer who builds awesome stuff for the web. You can hire him to build awesome stuff for you.

Comments and Trackbacks

Commenting on this entry has been closed.