XML Stylesheet and UTF-8

I saw “Russell’s post”:http://www.russellbeattie.com/notebook/1004309.html that discussed adding an XSL stylesheet to the site RSS feed, so that people who click on it get a pretty display instead of the ugly raw XML.

In the process of copying this to my blog, I re-discovered that “SmartyPants”:http://daringfireball.net/projects/smartypants/ was spitting out “HTML entities”:http://www.htmlhelp.com/reference/html40/entities/ in decimal, which look ugly in XML. (HTML entities in their text form don’t look any better). My blog has been XHTML and UTF-8 since I started reading “dive into mark”:http://www.diveintomark.org, so I modified my copy of SmartyPants to spit out UTF-8 sequences instead of HTML entities. In the process, I had to:

* turn off HTML entity processing by movable type, by setting @NoHTMLEntities 1@ in @mt.cfg@. Otherwise HTML::Entities was converting my UTF-8 sequences back into HTML entities…
* explicity set my movable type charset to utf8 (set @PublishCharset@ in @mt.cfg@). This gets used in the Movable Type edit pages, so I can now paste non-ASCII characters into entries and have them come out the other end as UTF-8, instead of as the raw ISO 8859-1 bytes (which aren’t valid UTF-8 characters).
* get rid of a couple of leftover @”charset=iso-8859-1″@ tags in my templates.

Only the “RSS 1.0 feed”:/chk/index.rdf works so far; I don’t have a corresponding XSL stylesheet for RSS 2.0…

posted at 8:06 pm on Thursday, September 18, 2003 in Site News | Comments Off on XML Stylesheet and UTF-8

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.