SH Sean Harding/blog
What a guy
Friday, February 28th, 2003

Andrew Burnett, the man who was sentenced to prison in 2001 for throwing Leo the dog into traffic during an argument with his owner, has decided to prove once again what a classy man he is. He’s suing the San Jose Mercury News and Leo’s owner Sara McBurnett for damages he suffered due to the publicity around the case. The Mercury News has an article (via Romenesko).

Dar Williams @ Easy Street
Thursday, February 27th, 2003

This evening, Dar Williams did a free mini-show at Easy Street Records on Queen Anne. It’s a bummer that she didn’t play a full show this time around. Seeing a few songs is better than seeing none, but a full-length Dar show is a thing of beauty.

She played a few songs from her new album, "The Beauty of the Rain" ("Farewell to the Old Me," "Fishing in the Morning," "I Saw a Bird Fly Away" and "The One Who Knows") and one of her older ones ("Christians and the Pagans"). Then she came out and did "February" as an encore. It as a very nice set.


There were a lot of people there. Quite a few more than I’d expected. I’d estimate the turnout at about 200, but I’m notoriously inept at judging crowd size. Sadly, due to the layout of the space (merchandise racks everywhere) and the number of people who got there before us, we didn’t get to be very close. That, coupled with lighting too low for available light shooting and the anemic flash on my digital camera, means that I didn’t get any usable pictures of Dar at all. I could have brought a film camera with fast film or an external flash, but I didn’t really feel like lugging that stuff around. And I was expecting to be a lot closer. I learned my lesson about getting to a free Dar appearance late. I’ve been to enough Dar shows that I really should have known better.

Striking workers and innocent bystanders
Thursday, February 27th, 2003

Erin at Yale brings up some interesting issues around how labor disputes affect people on the fringes of the dispute. Graduate students at Yale are preparing to strike, and in the process, they’re dragging every undergraduate into the debate.

Regardless of where I stand on a given issue, I usually think that actions with massive repercussions for uninvolved parties are a bad tactic. For example, I almost always side with teachers when they are in negotations. I fully believe that teachers in this country are underpaid and are very often forced to work in very bad conditions. I strongly support any effort to correct these problems. But I generally disagree with teacher strikes because they force so many innocent parties to be involved against their will. It’s not the students’ fault that teachers are underpaid. It’s not the parents’ fault if the contact isn’t fair. Yet these are the very people who are most damaged when teachers go on strike.

This is a difficult subject. On one hand, it’s unfair to inconvenience the students because of problems that are out of their control (and very often out of their realm of understanding as well). On the other hand, drastic measures are sometimes needed to effect change. If there’s a simple binary choice between having the workers be unfairly treated and having innocent bystanders drawn into the debate beause of a strike, which would I choose? I just can’t answer that question…

Spammer reconnaissance part 1
Monday, February 24th, 2003

Last spring, I started wondering about the web email address harvesters spammers use. I knew they were hitting my site — I get spam to addresses only shown there. But I had no idea which of the entries in my Apache logs corresponded to the spammers’ harvesters. I didn’t know how many different harvesters were coming around. I had no clue how long it took between the time an address is harvested off the web and the first piece of spam comes in. And I didn’t have any way to "take back" addresses after they were harvested.

So, on a boring weekend day I decided to make a simple system to help me gather information. I wrote a tiny bit of code to generate a unique email address for every page load on my main web site. Every time one of those pages is fetched, the email address at the bottom will be different. It’s basically an encrypted identifier that I can later correlate with log entries. Incoming mail to anything in the subdomain used for those addresses goes through a bit of software that decrypts the ID (the left hand side of the address) and makes sure it’s a valid generated address. This validation step has the added benefit that I can add in any harvested address to a blacklist as soon as I receive spam on it, preventing any future spam to that address. And since I can correllate it with the logs to find out who harvested it, I can also invalidate any other addresses sent to the same client. And this all happens without inconveniencing people who want to send me mail from my web pages; there’s still a perfectly valid, clickable email address on every page.

This little experiment hasn’t produced any groundbreaking information, but I have found a few interesting tidbits. I was surprised by how little of the spam I’ve received turned out to be from recent address harvesting. In the 8 months or so that I’ve been doing this, there have only been about fifteen spam messages sent to these addresses. It’s possible that it takes longer than 8 months for the addresses to get into wide circulation, so I’ll have to keep watching to see if the spam ratio ramps up.

Here are the access log entries that directly resulted in spam: - - [15/Sep/2002:13:22:24 -0700] “GET / HTTP/1.1” 200 2584 “-” “Internet Explore 5.x” - - [31/Oct/2002:21:54:45 -0800] “GET /sean/ HTTP/1.1” 200 2445 “-” “-“ - - [02/Jul/2002:01:37:38 -0700] “GET /s/music/ HTTP/1.1” 200 7183 “” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)” - - [10/Oct/2002:00:28:47 -0700] “GET /sean/ HTTP/1.1” 200 2445 “” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)” - - [18/Jan/2003:23:59:00 -0800] “GET /sean/ HTTP/1.0” 200 2433 “-” “Mozilla/3.0 (compatible)” - - [26/Oct/2002:04:12:30 -0700] “GET /s/words/story/20020301a/index.html HTTP/1.1” 200 2402 “” “Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)” - - [27/Nov/2002:18:58:49 -0800] “GET /sean/ HTTP/1.1” 200 2445 “-” “Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)” - - [10/Sep/2002:16:47:36 -0700] “GET /s/ HTTP/1.1” 200 2445 “” “-“ - - [14/Nov/2002:09:01:48 -0800] “GET /sean/ HTTP/1.1” 200 2445 “-” “Internet Explore 5.x”

A couple of the clients were nice enough to actually send along real referer information. That kind of surprised me. One sent an obviously faked referer of "" A couple of them (both from IP addresses in China) have the bogus user-agent string "Internet Explore 5.x." The rest either sent no user-agent header at all or had one that looks fairly "normal."

The shortest time between the address being harvested and receiving its first spam is seven hours. The longest is 117 days. It seems that it almost always happens within a day or otherwise it takes weeks. I’ve considered adding in a timestamp and having the addresses only good for four hours or so to allow leigitmate messages to get through without allowing spam in, but I haven’t gotten around to implementing that yet.

I was hoping to find some more decisive patterns in spam harvesters that would allow me to block a significant number of them. I’m not surprised that I didn’t find it, but I am a little disappointed. I’ll probably start blocking anything that identifies itself as "Internet Explore" or sends a referer of "" If my page gets linked from the the main page of Microsoft’s web site, I’ll be in trouble, but I’m not losing any sleep over that.

Airline ticket pricing
Wednesday, February 19th, 2003

Lately I’ve been making arrangements for some spring and summer vacations. Scheduling trips, planning to meet with various friends and making reservations with hotels is always a challenge, but it’s worth it in the end. However, choosing and buying airline tickets is a uniquely frustrating experience.

It’s bad enough to try to figure out which airport to go to, which airline to fly, when to leave, when to come back and how many connections are acceptable. The apparent randomness of the pricing model makes getting a decent fare feel like an exercise in futility. Usually, I try to balance comfort and convenience (not having to make 5 connections, not having a 14 hour layover in Pittsburgh, not competing for general admission seats on Southwest) with reasonable price. So I make an attempt to leave at less popular times of the day. I try to fly into and out of major airports. I am relatively flexible about which airlines I’ll fly.

But in the end, it still never seems to matter. Most recently, my girlfriend and I spent probably 30 minutes on trying different options to find a good price with a schedule that accomodates both of us. We finally found it, went through the ordering process and at the end were told that the price had changed and we’d have to pay $40 more. Uh, I don’t think so! At first I thought we must have just taken too long to get through the pipeline. So I found the flight again (still listed at the original lower price) and went through the entire process as quickly as possible. Once again, they raised the price at the very end of the order. At that point, we gave up and went directly to the airline’s website and bought the tickets there. For a few bucks less than even Expedia’s lower price.

KARE11 in Minneapolis has done some fascinating research into airline pricing (link thanks to Al’s Morning Meeting). They did a survey of people on a Northwest Airlines flight. Sixty-five people responded to their survey and from those, there were 50 unique prices for the flight. Even more amazing, an airline analyst KARE11 spoke to found 1,369 possible fares for the flight they’d studied. This is a flight with only 124 seats! If you do any airline travel at all, I encourage you to go read entire article.

< Newer posts | Home | Older posts >