xmllibrary that ships with PLT, and it seems to work well enough, so we'll go with that.
(require (lib "xml.ss" "xml") (lib "match.ss") (lib "url.ss" "net") (lib "1.ss" "srfi")) ; url is a url as defined by url.ss (define (get-rss url) (xml->xexpr ((eliminate-whitespace '(rss channel item) (lambda (x) x)) (document-element (call/input-url url get-pure-port read-xml)))))
get-rssfunction is used to return an S-Expression from a URL, i.e. retrieve some XML from a URL and convert it into an S-Expression. The
eliminate-whitespacefunction returns a function that will remove strings containing only whitespace from the elements named in the first argument. This cleans up the S-Expression so the
matchexpression we'll write is easier; we don't need to account for whitespace, which isn't significant to us anyway.
(define (rss->item rss) (letrec ((good-item (lambda (p) (and (pair? p) p)))) (filter good-item (match rss (('rss _ ('channel _ . items)) (map (match-lambda (('item _ ('title _ title) ('link _ link) ('description _ . desc ) . _) (list link title desc)) (('item ('title _ title) ('link _ link) (_ . _) ('body _ . body)) (list link title body)) (_ '())) items)) (('rdf:RDF (_ ...) _ ('channel . _). items) (map (match-lambda (('item _ ('title _ title) ('link _ link) ('description _ . desc ) . _) (list link title desc)) (_ '())) items)) ))))
rss->itemwill handle RSS 2.0 and 0.91 in the first case and RSS 1.0 (RDF) in the second case. The matching is done in a nested manner, the initial
matchfinds the items in the <channel> element, and then uses
match-lambdato filter the child items found by the first match. The output of the match is a list of the link, title, and description elements; however, the match can also return an empty list, so we use SRFI-1's
filteron the output of the whole thing to identify non-empty lists. Our
good-itemfunction returns either #f or the match. This match is a bit fragile, since RSS doesn't dictate in what order child elements can appear under <item>. However, in practice it works well enough. Feed Validaton RSS 2.0 Specification RSS 1.0 Specification Parsing RSS At All Costs HtmlPrag -- HectorEGomezMorales - 05 May 2004 -- GordonWeakliem - 06 Aug 2004