SH Sean Harding/blog
Spammer reconnaissance part 1
Monday, February 24th, 2003

Last spring, I started wondering about the web email address harvesters spammers use. I knew they were hitting my site — I get spam to addresses only shown there. But I had no idea which of the entries in my Apache logs corresponded to the spammers’ harvesters. I didn’t know how many different harvesters were coming around. I had no clue how long it took between the time an address is harvested off the web and the first piece of spam comes in. And I didn’t have any way to "take back" addresses after they were harvested.

So, on a boring weekend day I decided to make a simple system to help me gather information. I wrote a tiny bit of code to generate a unique email address for every page load on my main web site. Every time one of those pages is fetched, the email address at the bottom will be different. It’s basically an encrypted identifier that I can later correlate with log entries. Incoming mail to anything in the subdomain used for those addresses goes through a bit of software that decrypts the ID (the left hand side of the address) and makes sure it’s a valid generated address. This validation step has the added benefit that I can add in any harvested address to a blacklist as soon as I receive spam on it, preventing any future spam to that address. And since I can correllate it with the logs to find out who harvested it, I can also invalidate any other addresses sent to the same client. And this all happens without inconveniencing people who want to send me mail from my web pages; there’s still a perfectly valid, clickable email address on every page.

This little experiment hasn’t produced any groundbreaking information, but I have found a few interesting tidbits. I was surprised by how little of the spam I’ve received turned out to be from recent address harvesting. In the 8 months or so that I’ve been doing this, there have only been about fifteen spam messages sent to these addresses. It’s possible that it takes longer than 8 months for the addresses to get into wide circulation, so I’ll have to keep watching to see if the spam ratio ramps up.

Here are the access log entries that directly resulted in spam: - - [15/Sep/2002:13:22:24 -0700] “GET / HTTP/1.1” 200 2584 “-” “Internet Explore 5.x” - - [31/Oct/2002:21:54:45 -0800] “GET /sean/ HTTP/1.1” 200 2445 “-” “-“ - - [02/Jul/2002:01:37:38 -0700] “GET /s/music/ HTTP/1.1” 200 7183 “” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)” - - [10/Oct/2002:00:28:47 -0700] “GET /sean/ HTTP/1.1” 200 2445 “” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)” - - [18/Jan/2003:23:59:00 -0800] “GET /sean/ HTTP/1.0” 200 2433 “-” “Mozilla/3.0 (compatible)” - - [26/Oct/2002:04:12:30 -0700] “GET /s/words/story/20020301a/index.html HTTP/1.1” 200 2402 “” “Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)” - - [27/Nov/2002:18:58:49 -0800] “GET /sean/ HTTP/1.1” 200 2445 “-” “Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)” - - [10/Sep/2002:16:47:36 -0700] “GET /s/ HTTP/1.1” 200 2445 “” “-“ - - [14/Nov/2002:09:01:48 -0800] “GET /sean/ HTTP/1.1” 200 2445 “-” “Internet Explore 5.x”

A couple of the clients were nice enough to actually send along real referer information. That kind of surprised me. One sent an obviously faked referer of "" A couple of them (both from IP addresses in China) have the bogus user-agent string "Internet Explore 5.x." The rest either sent no user-agent header at all or had one that looks fairly "normal."

The shortest time between the address being harvested and receiving its first spam is seven hours. The longest is 117 days. It seems that it almost always happens within a day or otherwise it takes weeks. I’ve considered adding in a timestamp and having the addresses only good for four hours or so to allow leigitmate messages to get through without allowing spam in, but I haven’t gotten around to implementing that yet.

I was hoping to find some more decisive patterns in spam harvesters that would allow me to block a significant number of them. I’m not surprised that I didn’t find it, but I am a little disappointed. I’ll probably start blocking anything that identifies itself as "Internet Explore" or sends a referer of "" If my page gets linked from the the main page of Microsoft’s web site, I’ll be in trouble, but I’m not losing any sleep over that.

Airline ticket pricing
Wednesday, February 19th, 2003

Lately I’ve been making arrangements for some spring and summer vacations. Scheduling trips, planning to meet with various friends and making reservations with hotels is always a challenge, but it’s worth it in the end. However, choosing and buying airline tickets is a uniquely frustrating experience.

It’s bad enough to try to figure out which airport to go to, which airline to fly, when to leave, when to come back and how many connections are acceptable. The apparent randomness of the pricing model makes getting a decent fare feel like an exercise in futility. Usually, I try to balance comfort and convenience (not having to make 5 connections, not having a 14 hour layover in Pittsburgh, not competing for general admission seats on Southwest) with reasonable price. So I make an attempt to leave at less popular times of the day. I try to fly into and out of major airports. I am relatively flexible about which airlines I’ll fly.

But in the end, it still never seems to matter. Most recently, my girlfriend and I spent probably 30 minutes on trying different options to find a good price with a schedule that accomodates both of us. We finally found it, went through the ordering process and at the end were told that the price had changed and we’d have to pay $40 more. Uh, I don’t think so! At first I thought we must have just taken too long to get through the pipeline. So I found the flight again (still listed at the original lower price) and went through the entire process as quickly as possible. Once again, they raised the price at the very end of the order. At that point, we gave up and went directly to the airline’s website and bought the tickets there. For a few bucks less than even Expedia’s lower price.

KARE11 in Minneapolis has done some fascinating research into airline pricing (link thanks to Al’s Morning Meeting). They did a survey of people on a Northwest Airlines flight. Sixty-five people responded to their survey and from those, there were 50 unique prices for the flight. Even more amazing, an airline analyst KARE11 spoke to found 1,369 possible fares for the flight they’d studied. This is a flight with only 124 seats! If you do any airline travel at all, I encourage you to go read entire article.

Sunday, February 16th, 2003

Eric has some good commentary on the perils of ballot initiative systems such as the one we have here in Washington State. Each time Tim Eyman introduces a new cut-the-taxes initiative he strikes another blow at what makes our country run. Regardless of whether one agrees completely with how the government spends our money (few people do), we must realize that simply cutting off the money that funds programs is not the way to reform it. Taking money away from public transit will not solve road problems. Refusing to vote for school bonds will not fix the education system. Irresponsible initiatives that blindly cut as many taxes as they can almost unfailingly to irreparable harm to society without having any impact on the problem their proponents claim they solve.

If this keeps up, I’ll be very tempted to move to a state with a more reasonable political system.

Arrogance in the Amazon
Thursday, February 13th, 2003

Tonight was the big night: the premiere of Survivor: The Amazon. I was not disappointed. The big twsit this year is that the tribes are split along gender lines, and I think it’s brilliant. Predictably, the guys (Tambaqui tribe) instantly were sure they’d win every challenge, and they made no attempt to mask their confidence. I think it’s been demonstrated time and time again that cockiness does not win this game — especially in the beginning. But the guys this season started off with more arrogance than I think I’ve seen in all of the other seasons combined.

The men did clearly beat the ladies (Jaburu tribe) in the race to get fire. They used the kerosene from their laterns to start it and got it going in a matter of minutes. The women used nothing but sparks and kindling, spending hours on the project. As we watched this spectacle, my girlfriend and I thought that surely the women must have missed the kerosene in their kit. Or they didn’t know what it was. Because if they knew they had kerosene, why on earth would they waste their time trying to start a fire without it??? Yet moments after showing that the women finally got their fire started, Deena was shown explaining that they had been using the kerosene lamps for lighting. Perhaps this was just an editing trick, but it certainly gave the impression that the women knew they had kerosene but didn’t think to use a little of it to help start the fire. Duh!

The guys also outdid the women in the shelter department. The ladies weren’t even in that race. On the first day, the men built what has to be the nicest shelter I’ve ever seen on a Survivor season. The women had four logs laying on the ground. I don’t think that’ll cut it.

The first immunity challenge required a combination of physical and mental skills. It had a mix of puzzles and races with the members of the tribe tied together. They collected keys along the way to unlock themselves and split into smaller and smaller groups. This challenge is where the guys’ cockiness began bite them in the ass. They had a solid lead in the beginning. The women took waaaayyy too long on the first puzzle and I was sure they’d lost the whole challenge right there. But the crack team of Ryan and Daniel managed to blow their team’s massive lead by being unable to make it across a set of balance beams. Then the men sealed their fate by taking forever on the second puzzle while the women made it look easy. Jaburu won immunity and the men had to face the fact that they weren’t as invincible as they’d thought.

I think this is going to turn out to be a pretty interesting season. I’ll be very curious to see how the challenges play out over the coming weeks. I’m also looking forward to getting a better idea of how the power structure in Tambaqui is going to work. Each time I thought I had it figured out this week, something threw me for a loop.

It might be natural for me, as a man, to root for the men in this game. But I just can’t do that. The arrogance they’ve shown makes me want to see them lose again and again. If they don’t get their act together, the women are going to pick them off one by one on their way to the final three. I think this is going to be fun.

More spam
Thursday, February 13th, 2003

Here are some stats from my spam blacklists over the past few months. This is based on around 80,500 blocked messages. Sampling I’ve done indicates that over 99% of these messages are truly spam. In fact, I’m only aware of one case of a false positive with this system.

Remember, this isn’t really related to where the spam I do receive comes from. This only counts messages that were blocked by my Sendmail blacklists before they got to my spam filters.

< Newer posts | Home | Older posts >