Recently, there’s been a discussion on The WELL about how some people are having trouble viewing pages on Fotolog.net. It turns out that Fotolog is restricting access to images based on the HTTP Referer (yes, that’s how it’s spelled in HTTP) header in an unfortunate way. Surely their goal is to prevent bandwidth thieves (intentional or otherwise) from embedding Fotolog images directly into their own pages. That’s a serious problem, but Fotolog’s solution is not good.
Rules to limit image access based on Referer are pretty common. It’s one of the first things most people think of when they discover they have a problem with people using their images on other sites. And it tends to work fairly well. But a lot of people who implement Referer restrictions don’t really understand all the implications, limitations and tricky configuration issues. There are a few key things to always remember when setting up Referer-based access control.
The first issue is the one that tends to cause the most problems, and it’s the one that Fotolog appears to have gotten wrong. It’s a big mistake to set up your access controls so that requests without Referers are denied. There are several legitimate access cases in which the Referer may not exist. Some desktop firewall products such as ZoneAlarm have features to remove the Referer header from HTTP requests. Whether this is a wise feature to enable is up for debate, but it’s out there and it is used. Some proxy servers and anonymizers also remove the Referer from requests. And there are a few HTTP clients and browsers floating around that don’t send a Referer header. All of these products are likely to be used by some portion of your site’s legitimate viewers. If you don’t allow requests without Referers, you’re going to be keeping these people out of your site. Permitting Refererless requests shouldn’t significantly decrease the effectiveness of your access rules. As long as they deny requests with invalid (but not empty) Referers, they’ll make embedding your images in other sites’ pages largely useless. The vast majority of viewers of the outside pages will send a Referer and will be denied.
My second point, that legitimate Referers aren’t always what you expect, is less common but it still catches a lot of people. When you turn on Referer checking, you should be absolutely sure about where your legitimate accesses come from. The best way to do this is to look at the Referers in your web logs and classify each of them as allowed or disallowed. There are a lot of off-site Referers that people want to allow. For example, do you ever sell anything on eBay with the images on your server? If so, you’d better let in Referers from eBay or it’s going to be embarrassing. Ever email friends direct links to an image? If you do, you might want to allow Referers from Hotmail and Yahoo! Mail. Also keep in mind the various names that your own site may be referred to as. Both www.sharding.org and sharding.org work to get to my site, so I’d have to be careful to construct my Referer rules to allow them both.
For some types of access control, it’s also very important to remember that the Referer header can be spoofed. Generally, if this fact causes a problem for your site, it means that you’re going about something the wrong way. You should never rely on the accuracy of data supplied by a user for security. This includes everything in the HTTP headers. Blocking access by Referer tends to work pretty well for preventing people from embedding your images in other pages because it makes their page not work for all of their visitors. That’s probably a reasonable (though surely not foolproof) use of Referer. On the other hand, I’ve seen some sites do things like try to keep people from downloading their photos by enforcing a Referer constraint and disabling right-click with Javascript. I’ve seen other sites that "password protect" areas of their site by making you go through an authentication page and from then on only checking that your Referer header indicates that you’re coming from one of their other pages. There are so many things wrong with these strategies that I could write a novel about it. I’ll just say this: access control based on Referer is at best a mild deterrent.
If you decide to implement any Referer-based controls, I urge you to not only keep these points in mind, but test it extensively. Clear your cache between tests. And test it from other computers, networks and browsers. One common mistake I’ve seen people make is that they’ll view the page, put in the access controls, reload the page and assume that since the images are there, the configuration is correct. Unfortunately, if the image is in your browser’s cache (and you haven’t done something to force the browser to reload it from the server), it may display fine for you while it will fail for everyone else.
Presumably you think you have some good content if you are concerned enough about other people using it to want to put in Referer checks. If you have good content, you want people to see it. So the number one rule should always be to make sure things work properly for legitimate visitors. Keep that in mind as your set up your rules and your visitors will thank you.
Update (12:55 PDT, 03-13-2003): Fotolog has corrected their configuration. They should now be allowing requests for images without Referer headers.