s c h e m a t i c s : c o o k b o o k

/ Cookbook.ParsingWithLazySxml

This Web


WebHome 
WebChanges 
TOC (with recipes)
NewRecipe 
WebTopicList 
WebStatistics 

Other Webs


Chicken
Cookbook
Erlang
Know
Main
Plugins
Sandbox
Scm
TWiki  

Schematics


Schematics Home
Sourceforge Page
SchemeWiki.org
Original Cookbook
RSS

Scheme Links


Schemers.org
Scheme FAQ
R5RS
SRFIs
Scheme Cross Reference
PLT Scheme SISC
Scheme48 SCM
MIT Scheme scsh
JScheme Kawa
Chicken Guile
Bigloo Tiny
Gambit LispMe
GaucheChez

Lambda the Ultimate
TWiki.org

Parsing Large XML Files with SXML's lazy:xml->sxml and lazy:sxpath

Problem

Some XML files are too large to hold in memory all at once without significant overhead, and the traditional solution --- SAX --- is often a bit too low level for writing a simple solution.

Solution

The sxml PLaneT package provides a lazy pull-style parser that we can use to progressively parse through a large XML file.

500 Can't connect to 127.0.0.1:8778 (connect: Connection refused)

At this point, lazy-doc is a lazy sxml tree, which we can then query using lazy:sxpath.

500 Can't connect to 127.0.0.1:8778 (connect: Connection refused)

When we run this, we see that we get an odd stream: 500 Can't connect to 127.0.0.1:8778 (connect: Connection refused)

and we can force the second element to access more results. This allows us to do parsing in pull-parsing style.

As a more realistic example, here is a test module to parse some essential details from the Gene Ontology, a standard hierarchy of terms used in bioinformatics: 500 Can't connect to 127.0.0.1:8778 (connect: Connection refused)

Discussion


Comments about this recipe

Contributors

-- DannyYoo - 08 May 2006

CookbookForm
TopicType: Recipe
ParentTopic: XmlChapter
TopicOrder: 999

 
 
Copyright © 2004 by the contributing authors. All material on the Schematics Cookbook web site is the property of the contributing authors.
The copyright for certain compilations of material taken from this website is held by the SchematicsEditorsGroup - see ContributorAgreement & LGPL.
Other than such compilations, this material can be redistributed and/or modified under the terms of the GNU Lesser General Public License (LGPL), version 2.1, as published by the Free Software Foundation.
Ideas, requests, problems regarding Schematics Cookbook? Send feedback.
/ You are Main.guest