Internet Explorer image caching revisited

A few days ago “I complained about Internet Explorer”:http://blog.cfrq.net/chk/archives/2004/08/21/argh-msie-and-bandwidth/

Google searching leads me to believe that MSIE doesn’t send If-Modified-Since: headers for images (and possibly other files, like CSS); instead, it expects to see an Expires: header in the HTTP response (It will also apparently listen to Cache-Control: headers). The beauty of standards is that there are so many to choose from…

More Googling led me to the following configuration directives for Apache:

   ExpiresActive On
   ExpiresByType image/gif "access plus 1 week"
   ExpiresByType image/jpeg "access plus 1 week"
   ExpiresByType image/png "access plus 1 week"

(It’s possible that image/* will work; I haven’t tried it).

I hope this helps someone else; I hope it helps me remeber next time :-)

posted at 10:50 am on Thursday, August 26, 2004 in Site News | Comments Off on Internet Explorer image caching revisited

Argh; MSIE and bandwidth

It appears that if you set the Cache settings in IE to “Automatically” or “Every visit to the page”, then every time you visit a page at blog.cfrq.net IE fetches all page objects (page, CSS, favicon, embedded images). For some of them, it is sending the If-Modified-Since: header (I see 304 responses for the blog CSS, for example), but it does not seem to be sending If-Modified-Since: for the banner JPEGs. This means that MSIE visitors download the banners several times in a row as they browse the site. This not only wastes my bandwidth, but it also interferes with their experience, since they have to wait for the banner to download on every page visit.

I’ve noticed IE doing this before on the client side with image intense applications (like MovableType :-), but I hadn’t investigated until recently, when a small increase in visitors to my blog site _doubled_ the bandwidth used…

Is this a known IE bug? Is there anything I can do on the server side to work around it? The investigation continues…

posted at 8:44 am on Saturday, August 21, 2004 in Rants, Site News | Comments (2)
  1. Reid says:

    You could conditionally use a low-res substitute for IE users..

  2. Harald says:

    An excellent suggestion, and trivial to implement. Since WordPress already shoves a bunch of rewrite rules into a .htaccess file, it is trivial to add another one to conditionally rewrite the .jpg URLs for MSIE users. I’ve compressed the JPEGs to about 20% of their original size. The quality suffers, but less than I expected it would…

spam source

Ok, so it turns out that all (well, 125 of 126 :-) of the spam I’m getting these days is coming through my pobox.com address. The greylisting is working fine, in other words :-)

It’s been great having a portable email address, but now that I pay real money for my own domains, maybe it is time to switch over. I can do more accurate spam filtering on my personal server than they can on their shared servers Unfortunately, the massive spam volumes floating around these days are forcing us to these drastic measures. I’m beginning to believe the pessimists; e-mail is dying…

posted at 10:57 pm on Sunday, August 08, 2004 in General, Security, Site News | Comments (1)
  1. Re: oops
    So after looking at “the mail I accidentally misfiled”:http://blog.cfrq.net/chk/archives/2004/08/16/oops/ there were, in fact, about 150 spam (almost 50%).

    pobox.com has completely revamped their spam filtering service since I last looked; I can n…

greylist results revisited

So maybe I spoke too soon; in “greylist results”:http://blog.cfrq.net/chk/archives/2004/07/14/greylist-results/ I said that my spam volume had gone way down. Well, it has come back up again. I’ll have to write scripts to prove it, but I have a theory.

Machines owned by spammers are being used relatively infrequently, maybe to reduce the chances of getting detected and blacklisted? So the first time a spam host shows up, it gets greylisted. But if they show up again a day or a week later, they get past the greylist filter, because they’re now in the cache (but haven’t been expired yet).

Maybe a fix would be to put two cache timeouts in; the first would be for machines that have not yet successfully delivered a message i.e. by retrying the original delivery), and would be relatively short, probably less than a day. The second would be the existing long timeout for machines that have already passed the first test.

That would eliminate spam machines that only show up infrequently. I don’t know whether it is worth the effort, though.

On the plus side, greylisting _is_ still keeping out the virus traffic…

posted at 12:18 pm on Sunday, August 08, 2004 in Security, Site News | Comments Off on greylist results revisited

Comments

At the request of “a loyal reader”:http://rae.tnir.org/ I’ve reintroduced comments on the main page and in the RSS and Atom feeds. Enjoy.

posted at 1:03 pm on Wednesday, July 28, 2004 in Site News | Comments (1)
  1. Reid says:

    Thanks Harald! Btw, my web site is just http://rae.tnir.org these days. :-)

greylist results

It’s been a week since “I installed postgrey”:http://blog.cfrq.net/chk/archives/2004/07/06/greylist/.

Wow!

My spam volume has droppped back to manageable levels; 10-20 per day (maximum). Even better, I’m no longer getting 10s of those encrypted ZIP file viruses every day; greylisting stops them all dead (at least so far :-).

I suppose the spammers will eventually figure it out, and start runing mail queues, but (in theory) it should be easier to pick those up via DNS block lists…

posted at 2:36 pm on Wednesday, July 14, 2004 in Site News | Comments Off on greylist results

greylist

Spam volumes have been rising continually around here. I started my foray into automated spam filtering a couple of years back; at the time, I was receiving about 100 per _quarter_. Now I’m getting almost 100 per _day_.

I needed an excuse to upgrade my “postfix”:http://www.postfix.org/ install to the new “2.1 release”:ftp://ftp.utoronto.ca/mirror/packages/postfix/index.html, so I decided to install “postgrey”:http://isg.ee.ethz.ch/tools/postgrey/, a “greylisting”:http://projects.puremagic.com/greylisting/ daemon. So far I’m using it after all of my other spamtraps, but it seems to be working reasonably well. I’ll be watching the logs for a while to make sure…

In a nutshell, greylisting relies on the fact that spammers use dump-and-run tactics, while legitimate email gets queued at the sender. So, when a new, previously unknown client connects, the mailserver sends a “temporary deny”. If that connection is a spammer, they’ll probably not return; the reject means the spam was refused. If the sender was legitimate, it will retry, and our server will allow the retry through.

Pretty cool, if you ask me :-)

posted at 9:40 pm on Tuesday, July 06, 2004 in Site News | Comments Off on greylist

Simple Page Editors

I’m currently using “Whisper”:http://www.whisper.cx/ for the static content around here, but I tripped over “EditThisPage”:http://editthispagephp.sourceforge.net/home/index.php the other day, and it looks useful also.

More and more of these things are cropping up, probably as a backlash against how complicated (and fragmented!) the Wiki space is getting…

EditThisPagePHP

posted at 2:11 pm on Friday, June 04, 2004 in Links, Site News | Comments Off on Simple Page Editors

comment spam

Some idiot script kiddy wiped out our bandwidth again today. He could have an automated tool, or he could be doing it manually. He’s trying to post comment spam to blog.org, but he’s repeatedly fetching pages over and over again (presumably to see if his comments are getting published or not).

The problem is that David’s pages are large (and getting larger all the time); an average of 200Kb each. So this spammer has single-handedly downloaded at least 70Mb of data today!

It’s one thing to try to abuse my server to get a site ranked higher in Google. It’s another thing entirely to waste _my_ bandwidth in the process!

64.57.64.0/19, 66.154.0.0/18, and 66.154.64.0/19 just made it into the blackhole list…

posted at 1:18 pm on Thursday, June 03, 2004 in Security, Site News | Comments (4)
  1. David Brake says:

    I was kept busy removing the comment spam this created on the other end today as well (unfortunately, the script kiddies are starting to randomise their IP addresses and choose from long lists of URLs so IP address or URL blocking is less effective). Makes me think the only long-term solution to comment spam may be one of these type in the numbers from an image plug-ins. Though apparently determined spammers are actually doing it by hand! AARGH!

  2. joy says:

    What about comment moderation in WP?

  3. Harald says:

    I’m using WP, and (as you can see) comment moderation is working.

    David’s still using MovableType, and his weblog is quite popular…

  4. I would recommend you setup some type of image number system so bots can’t spam!

First WP Problem

WordPress apparently doesn’t let me (easily) put fake HTML tags in my posts, even if I use HTML entities like < — it seems there’s a double decode going on.

(To get that to appear I had to type & amp ; amp ; lt ; (without the spaces).

I like to do things like <grin> in my posts…

*Update:* it looks like the problem is the fixEntities() function in the textile2 plugin…

posted at 10:27 pm on Wednesday, June 02, 2004 in Site News | Comments (2)
  1. Reid says:

    So did you fix the plugin and send the changes upstream?

    Can you tell I am browsing your site-related entries to see how the whole WordPress thing went? :-)

  2. Harald says:

    No, my WordPress Fu isn’t good enough yet; I’m still working around the problem.

Iñtërnâtiônàlizætiøn

I ran this test a long time ago with Movable Type (and had to make a whole bunch of changes to get it to work properly). I thought I’d try it again with WordPress…

How does my weblog perform using unicode. See also: “Survival guide to i18n”:http://intertwingly.net/stories/2004/04/14/i18n.html. Some tests:

bq. これは日本語のテキストです。読めますか
Let’s see how Unicode and weblogs does with Japanese :) これは日本語のテキストです。読めますか?…

bq. Let us test some Hindi Text
देखें हिन्दी कैसी नजर आती है। अरे वाह ये तो नजर आती है।

And check…

(via “Anne van Kesteren”:http://annevankesteren.nl/archives/2004/05/unicode via “Russell Beattie Notebook”:http://www.russellbeattie.com/notebook/1007860.html#1007929)

posted at 5:21 pm on Tuesday, June 01, 2004 in Site News | Comments (3)
  1. Harald says:

    How about comments?

    Στο κι όταν διοίκηση μπορούσε. Ώρα πω κάνε διοικητικό δημιουργική, ανά βγήκε ζητήσεις τα, μάτσο περίπου ποσοστό πω και. Ένα τα πακέτο πρώτοι, μια πηγαίου μεταφραστής δε, να κλπ επεξεργασία επιχειρηματίες. Θα για’ ερωτήσεις δοκιμάσεις. Αν άτομο διαδίκτυο διαπιστώνεις όλη.

  2. Reid says:

    Looks good. I notice, btw, that the comment was converted into HTML numbered entities instead of staying unicode. Or is that the way it is supposed to work?

    Of course, the final result will depend on the user’s web browser being able to display the unicode text correctly.

  3. Harald says:

    *sigh; I hadn’t noticed that. No, that’s _not_ how it is supposed to work; time to investigate a little, I guess…

Traffic Analysis

We actually didn’t get that much traffic last night from the slashdot crowd, other than one Australian tool who kept fetching image files over and over again with various random query arguments. 1734 fetches of one image; 1144 fetches of a slightly larger one. It might have been a browser bug, but somehow I doubt it. It was single-handedly responsible for about 250Mb of traffic in a few minutes; Fortunately that was at 1AM, so I don’t think anyone would have noticed. Into the black hole…

Meanwhile, some other jerk in Japan has been downloading over and over again from blog.org, resulting in almost a gigabyte of traffic in the last two days!!! He downloaded the same (large) pages, over and over again (200 or so times each), sometimes minutes apart; Unbelievable! Also into the black hole…

By comparison, total combined traffic from slashdot.org _and_ all traffic for the referenced paper is only about 180Mb in the last week. Even without the redirects in place, we would only have transferred between 170Mb and 480Mb of additional data (depending on the number of clients that support gzip compression).

I hate computers :-)

posted at 10:05 am on Thursday, May 27, 2004 in Site News | Comments Off on Traffic Analysis

slashdotted!

Several years ago (back when this machine was still a 486, actually), I put a global apache rewrite rule on the server to deny access to anyone who came here from slashdot. This was to avoid the so-called “slashdot effect”:http://en.wikipedia.org/wiki/Slashdot_effect.

Well, the rule has finally been triggered, thanks to “Extensible Programming for the 21st Century”:http://developers.slashdot.org/article.pl?sid=04/05/26/2231214 (A link to one of Greg’s articles).

Apparently denying the page was somewhat confusing, so I changed the rule to redirect to “this page”:http://www.cfrq.net/slashdot.html instead.

The server is holding up remarkably well under the load (much better than it did with comment spammers before I rate-limited the mt-comments.cgi scripts). Still, there is a lot of dynamic content here, and I don’t think Michelle wants the bandwidth headaches, so the rule stays.

posted at 9:19 pm on Wednesday, May 26, 2004 in Site News | Comments (4)
  1. joy says:

    You slashdot denying heathen! :-P

  2. Mark says:

    I read the article at http://www.third-bit.com/~gvwilson/xmlprog.html
    (my browser shows it was the link I visited) as refered by /. shortly after it was posted and did not notice any lag, /. effect or redirection.

    I thought it was hosted at UofT or HP.

    If it was hosted on your box then, good job! What sort of net connection do you have?

    Cheers

  3. Harald says:

    This site is currently redirecting automatically to the pyre.third-bit.com mirror (which is at UofT), but only for slashdot referers, so you might have read it here instead.

    As for our network connection, we are trying to use as little as possible of our generous host’s 3Mb/768Kb business-class DSL…

  4. Mark says:

    I was refered from slashdot, but due to a bug/feature of Galeon (as shipped with RH9.0) the referer field is not set when you open a new tab on a link.

    So http://www.third-bit.com served the page (your server survived). :)

    Here’s a test where pone.html was loaded by typing it into the address
    bar, ptwo.html was loaded via opening a new tab on a link from pone,
    and pthree was opened by clicking on a link from ptwo.

    Note referer is “-” except when it is set by the direct click loading of pthree.html.

    127.0.0.1 – – [28/May/2004:19:57:53 -0400] “GET /pone.html HTTP/1.1” 200 342 “-” “Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U;) Gecko/20030131”
    127.0.0.1 – – [28/May/2004:19:57:57 -0400] “GET /ptwo.html HTTP/1.1” 200 346 “-” “Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U;) Gecko/20030131”
    127.0.0.1 – – [28/May/2004:19:58:01 -0400] “GET /pthree.html HTTP/1.1” 200 342 “http://127.0.0.1/ptwo.html” “Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U;) Gecko/20030131”

    Cheers.

Movies update

I copied my “movie list”:http://blog.cfrq.net/chk/static/movies.html from the “blog entry”:http://blog.cfrq.net/chk/archives/000659.html over onto my static content pages. The “new movie list”:http://blog.cfrq.net/chk/static/movies.html I’ll try to keep up-to-date :-)

posted at 4:48 pm on Monday, April 19, 2004 in Site News | Comments (1)
  1. Jeff K says:

    Hey, where’s “Fahrenheit 9/11”? [n.b. I’m not a big fan of Moore’s agenda, but he collects a lot of important facts while pursuing it].

Bandwidth

It’s amazing what happens when you add a banner graphic to the weblog. The banner is larger than the main index page, and certainly larger than the individual pages; bandwidth has spiked quite a bit since I added it.

I did some checking, and noticed a surprisingly large number of clients fetching the .jpg over and over again, instead of pulling it out of local cache; what’s up with that?

posted at 8:54 am on Wednesday, March 10, 2004 in Site News | Comments Off on Bandwidth

Content Management Systems

In “Comes in Two Sizes”:http://www.third-bit.com/~gvwilson/blog/archives/000029.html Greg Wilson comments on Wiki technology.

One of the frustrations I find with wiki software (and, actually, open source software in general) is the proliferation of almost identical versions of a tool. There are too many wiki implementations out there, and each one seems to have one or two good features, but is also missing one or two important features.

TWiki is too large and feature-rich (and has a weird hybrid version of Wiki syntax), but it has real authentication (unlike most of the others) and has a working XML-RPC interface (useful for integrating with, say, movable type :-), so that’s why it is installed.

On the other hand, I’m using MoinMoin at work, because I don’t need authentication there, and it is easier to setup and use.

Anyway, what I’m really looking for for things like “the rolemaster pages”:http://www.cfrq.net/~rolemaster/ is a Content Management System that makes it easy for the casual user to create linked documents. This means, for example, that it should use Textile or Wiki syntax for editing. But the most important feature for me would be to extend the power of WikiWords to arbitrary phrases or keywords. For example, if I create a document with the title “Greg Wilson”, I’d like any instance of the string “Greg Wilson” to be replaced with a hyperlink. This makes it trivial to create content; just create the page, rebuild, and every reference to the topic will be magically linked, without using WikiWords. (The problem with WikiWords is that you have to remember to use them, and they don’t always fit comfortably. My Rolemaster characters don’t have last names, for example, so I’d have to name Alex as AlEx, or CharacterAlex, or something similar to get standard wikis to work).

WordPress comes close; there’s a plugin for keyword processing that could probably be extended to dynamically generate the list of keywords from the database. If only I had some time to play :-)

posted at 10:31 pm on Friday, February 27, 2004 in Site News | Comments (1)
  1. Reid says:

    Hm, can’t tell if all of the posting was by you or by Greg Wilson, and I’m too lazy to clikc the link, so there you go.

    If you have a work that isn’t InterCaps, you can (with TWiki anyway) use a syntax like [[wiki][text]] which will use the ‘wiki’ part as the link href, and display it as ‘text’. So to be bizarre about it, you *could* use [[Alex][Alex]]. :-)

Willow Quotes

That was a quote from Willow Rosenberg, btw; it’s evil Willow from the “third season”:http://www.tvtome.com/tvtome/servlet/EpisodeGuideSummary/showid-10/season-3/ episode “The Wish”:http://www.tvtome.com/tvtome/servlet/GuidePageServlet/showid-10/epid-43/ . It goes with “This is the part that’s less fun. When there isn’t any screaming.” :-)

There’s a Willow fan site titled “Bored Now”:http://www.borednow.envy.nu/, and I found sound clips at “Willow Sounds”:http://dogwood.phpwebhosting.com/~tvshrine/willow.htm.

posted at 9:35 am on Thursday, February 12, 2004 in Site News | Comments Off on Willow Quotes

yawn

“Bored now.”:http://blog.cfrq.net/chk/archives/000233.html

posted at 9:53 pm on Wednesday, February 11, 2004 in Site News | Comments (6)
  1. David Brake says:

    Sorry it’s boring you – what would make it exciting? More comments? ;-)

  2. Greg Wilson says:

    For twenty-five points, what 80s pop lyricist wrote:

    “Life was easier when it was boring.”

    As for “bored now”, yeah, I think blogs need to scratch an itch. Raymond Chen keeps writing “The Old New Thing” to explore how we got here (where “here” is the current tangle of Microsoft technologies). Jon udell blogs because he gets paid to write about the bleeding edge, and the only way to do that is to play there. Miles Thibault (student of mine at U of T) has just started a blog (on my orders) where he’ll write about his explorations of C-Python’s implementation, and so on. So, what’s your itch?

  3. Debbie says:

    (The Following Is A Completely Unedited Response Dictated Through Viavoice; Lack Of Editing Was Prompted By The Unfortunate Discovery That “Press Delete” Is Sometimes Misinterpreted As “Press Escape”, Which In Livejournal And Mt Comments Has Disastrous Consequences. Apologies In Advance For Incomprehensibility. )

    Now that was interesting; why would ViaVoice capitalize that macro?

    Anyway, I’ve been blocking for about seven years knell endive fine to the my interest comes and goes. I have found that adding photos makes it more interesting for me, as well as giving myself an assignment topic.

    For me, however, one of the major incentives (before my tendinitis made it hard to write as quickly as I think because I have to use ViaVoice) Was to improve my writing and my ability to write even when I didn’t feel like writing. the latter is an extremely useful skill for someone who writes for a living, especially magazine writing.

    what are your favorite kinds of entries, the ones you most in choy writing? Perhaps you could focus on those kinds of entries.

    Wow, I have no idea how much of the above is going to be understandable. :-)

  4. Debbie Ohi says:

    Apologies for this completely contentless comment whose sole purpose is to enable me to enter my correct personal posting information for future use.

  5. Debbie Ohi says:

    and of course I entered it incorrectly. Here’s another attempt.

  6. Debbie Ohi says:

    While were on the topic of movable type comments, have you had any problems with spam postings in your comment sections (not counting mine, of course :-)). I’m starting to encounter them more, comments masquerading as actual remarks about my blog, but in truth are just links to commercial sites.

Minor redesign

A couple of recent weblog entries finally inspired me to get a round tuit and make the text on my site resizable. I’m also playing with CSS image replacement techniques to bring you the sunrise (set?) banner. The old pretty borders with offset text boxes were fun to lay out, but I’m bored with them now so I’ve removed most of the borders in this revision. Finally, I was tired of having separate “layout” and “colour with layout” stylesheets, so I’ve collapsed them back into one file.

* “Full Page Zoom”:http://simon.incutio.com/archive/2003/11/09/fullPageZoom gave me the idea to use box sizes and margins in ems instead of pixels. The min-width: attribute on the sidebar works in Mozilla (to truncate the little graph) but fails miserably in IE 6, so don’t make your fonts too small.

* “font size rounding”:http://www.nedbatchelder.com/blog/20031118T075030.html gave me the trick I needed to make the font sizing the same in IE and Firebird.

posted at 1:11 pm on Saturday, November 22, 2003 in Site News | Comments (1)
  1. joy says:

    ooh ooh I like the new title image. It rocks! (Canadian of course ;-P )

Comments RSS feed

I’ve implemented a comments RSS feed, should anyone want to see reader comments to my weblog…

“http://blog.cfrq.net/chk/comments.xml”:http://blog.cfrq.net/chk/comments.xml

Enjoy :-)

posted at 3:46 pm on Wednesday, November 19, 2003 in Site News | Comments Off on Comments RSS feed
« Previous PageNext Page »