I recently discovered that a blog called (seriously) "Google Chrome Browser" was reblogging my site. (It of course has NO relationship to Google or the lovely folks on the Chrome team.)
This is a splog or "spam blog." It's less of a blog and more of a 'suck your feed in and reblog it.' Basically every post is duplicated or sucked in via RSS from somewhere else. I get this many times a week and have for years.
However, this particular site started showing up ahead of mine in searches and that's not cool.
Worse yet, they have almost 25k followers on Twitter. I've asked them a few times to stop doing this, but this time I got tired of it.
They're even 'hotlinking' my images, which means that all my PNGs are still hosted on my site. When you visit their site, the text is from my RSS but I pay for the images bandwidth. The irony of this is thick. Not to mention my copyright notice is intact on their site. ;)
When an image is linked to from another domain the HTTP_REFERER header is populated with the location that the image is linked from. That means when my web server gets a request for 'foo.png' from the Google Chrome Browser blog I can see the page that asked for that image.
For example:
Request URL:http://www.hanselman.com/blog/content/binary/Windows-Live-Writer/How-to-run-a-Virtual-Conference-for-10_E53C/image_5.png
Request Method:GET
Referer:http://google-chrome-browser.com/penny-pinching-cloud-how-run-two-day-virtual-conference-10
Because this differentiates the GET request that means I can do something about it. This brings up a few important things to remember in general about the web that I feel a lot of programmers forget about:
- The Internet is not a black box.
- You can do something about it.
That said, I want to detect these requests and serve a different image.
If I was using Apache and had an .htaccess file, I might do this:
RewriteCond %{HTTP:Referer} ^.*http://(?:www\.)?computersblogsexample.info.*$
RewriteHeader Referer: .* damn\.spammers
RewriteCond %{HTTP:Referer} ^.*http://(?:www\.)?google-chrome-browser.*$
RewriteHeader Referer: .* damn\.spammers
#make more of these for each evil spammer
RewriteCond %{HTTP:Referer} ^.*damn\.spammers.*$
RewriteRule ^.*\.(?:gif|jpg|png)$ /images/splog.png [NC,L]
Since I'm using IIS, I'll do similar rewrites in my web.config. I could do a whitelist where I only allow hotlinking from a few places, or a blacklist where I only block a few folks. Here's a blacklist.
I could have just made a single rule and put this bad domain in it but it would have only worked for one domain, so instead my buddy Ruslan suggested that I make a rewritemap and refer to it from the rule. This way I can add more domains to block as the evil spreads.
It was important to exclude the splog.png file that I am going to redirect the bad guy to, otherwise I'll get into a redirect loop where I redirect requests for the splog.png back to itself!
The result is effective. If you visit their site, I'll issue an HTTP 307 (Moved Temporarily) and then you'll see my splog.png image everywhere that they've hotlinked my image.
If you wanted to change the blacklist to a white list, you'd reverse the values of allow and block in the rewrite map:
Nice, simple and clean. I don't plan on playing "whac a mole" with sploggers as it's a losing game, but I will bring down the ban-hammer on particularly obnoxious examples of content theft, especially when they mess with my Google Juice.
© 2013 Scott Hanselman. All rights reserved.