Changed the HTTP
status code for server overload from 200 to 503.
Reason:
Originally I set the overload status code to 503, which is the correct code to
return under such circumstances. However, I noticed that many exploitative
download software just goes faster when it detects an overload. The thinking of
the programmers is purely selfish. They think that if there is an overload, they
should try harder to get a bigger share of a scarce resource. In this case,
though, the congestion is caused by the selfish downloader. So the more selfish
they behave, the worse the congestion becomes. Therefore I changed the return
code to 200. That ensured that selfish rampant download programs would just get
a big hunk of junk. A human user would see the congestion warnings on web pages
and slow down. So that seemed like a good idea.
The problem with using the status code 200 is that some search robots will fill
their indexes with junk. That's not good if you actually want to be
indexed. At first, this was not a problem because the reputable search robots
were well behaved. However, some time in 2005, Googlebot changed their moderate
download rates (about 2 or 4 downloads per minute) to the worst possible
rampant download behaviour. They simply tried to download entire web sites at
the absolute maximum download rate. As a result, the Google search results were
polluted by listings of bwshare congestion warning pages. Although Google does
have a service to reduce load on web servers if you send them a nice e-mail, my
last request for civilized behaviour fell on deaf ears. Googlebot continued to
get worse and worse, frequently filling its index with hundreds of junk pages
instead of my real web pages.
Now I've finally changed the HTTP result code for bwshare congestion back to
503. Googlebot sort of behaves correctly now. Googlebot sees the 503 code and
slows down for a short while. Then when it gets a 200 status, it goes very fast
again. So now its behaviour alternates between slow and fast. It doesn't adjust
in a really efficient manner. It still behaves as selfishly as possible.
However, this is better than nothing. If you set the download rate to 8 files
per minute, Googlebot downloads at an efficiency of around 50%. That is,
Googlebot downloads about 50% real files and about 50% warning messages with the
503 return code.
Fixed a semaphore leak.
This bug arose in version 0.1.5 because someone sent me a
bug fix for a problem where the configuration file parameters were not being
read correctly in Apache 2. The old code before version 0.1.5 worked fine in
Apache version 1, but the abysmal Apache 2 module developer documentation made
it difficult for me to fix the configuration error. So I just added the bug
fix
without checking to see if it was correct by my usual method of code
analysis. I used the primitive technique of code testing to establish that it
seemed to do no harm.
However, I later noticed that the linux kernel was experiencing exhaustion of
semaphore arrays. This made is necessary to re-boot the machine to clear the
semaphore arrays because I didn't know how to determine which semaphores were
orphan semaphores. Now I know how to clear out the orphan semaphores manually.
But it's best to not generate the orphans in the first place. The semaphore
leaks were occurring every time I ran apachetl
to stop, start or
restart the Apache server, or even to just get information on the status of the
running process.
So now I've fixed the bug. It was just a matter of activating some old features
in the code which were waiting to be activated.