NAME XML::TreePP -- Pure Perl implementation for parsing/writing xml files SYNOPSIS parse xml file into hash tree use XML::TreePP; my $tpp = XML::TreePP->new(); my $tree = $tpp->parsefile( "index.rdf" ); print "Title: ", $tree->{"rdf:RDF"}->{item}->[0]->{title}, "\n"; print "URL: ", $tree->{"rdf:RDF"}->{item}->[0]->{link}, "\n"; write xml as string from hash tree use XML::TreePP; my $tpp = XML::TreePP->new(); my $tree = { rss => { channel => { item => [ { title => "The Perl Directory", link => "http://www.perl.org/", }, { title => "The Comprehensive Perl Archive Network", link => "http://cpan.perl.org/", } ] } } }; my $xml = $tpp->write( $tree ); print $xml; get remote xml file with HTTP-GET and parse it into hash tree use XML::TreePP; my $tpp = XML::TreePP->new(); my $tree = $tpp->parsehttp( GET => "http://use.perl.org/index.rss" ); print "Title: ", $tree->{"rdf:RDF"}->{channel}->{title}, "\n"; print "URL: ", $tree->{"rdf:RDF"}->{channel}->{link}, "\n"; get remote xml file with HTTP-POST and parse it into hash tree use XML::TreePP; my $tpp = XML::TreePP->new( force_array => [qw( item )] ); my $cgiurl = "http://search.hatena.ne.jp/keyword"; my $keyword = "ajax"; my $cgiquery = "mode=rss2&word=".$keyword; my $tree = $tpp->parsehttp( POST => $cgiurl, $cgiquery ); print "Link: ", $tree->{rss}->{channel}->{item}->[0]->{link}, "\n"; print "Desc: ", $tree->{rss}->{channel}->{item}->[0]->{description}, "\n"; DESCRIPTION XML::TreePP module parses XML file and expands it for a hash tree. And also generate XML file from a hash tree. This is a pure Perl implementation. You can also download XML from remote web server like XMLHttpRequest object at JavaScript language. EXAMPLES Parse XML file Sample XML source: Yasuhisa Chizuko Shiori Yusuke Kairi Sample program to read a xml file and dump it: use XML::TreePP; use Data::Dumper; my $tpp = XML::TreePP->new(); my $tree = $tpp->parsefile( "family.xml" ); my $text = Dumper( $tree ); print $text; Result dumped: $VAR1 = { 'family' => { '-name' => 'Kawasaki', 'father' => 'Yasuhisa', 'mother' => 'Chizuko', 'children' => { 'girl' => 'Shiori' 'boy' => [ 'Yusuke', 'Kairi' ], } } }; Details: print $tree->{family}->{father}; # the father's given name. The prefix '-' is added on every attribute's name. print $tree->{family}->{"-name"}; # the family name of the family The array is used because the family has two boys. print $tree->{family}->{children}->{boy}->[1]; # The second boy's name print $tree->{family}->{children}->{girl}; # The girl's name Text node and attributes: If a element has both of a text node and attributes or both of a text node and other child nodes, value of a text node is moved to "#text" like child nodes. use XML::TreePP; use Data::Dumper; my $tpp = XML::TreePP->new(); my $source = 'Kawasaki Yusuke'; my $tree = $tpp->parse( $source ); my $text = Dumper( $tree ); print $text; The result dumped is following: $VAR1 = { 'span' => { '-class' => 'author', '#text' => 'Kawasaki Yusuke' } }; The special node name of "#text" is used because this elements has attribute(s) in addition to the text node. See also "text_node_key" option. METHODS new This constructor method returns a new XML::TreePP object with %options. $tpp = XML::TreePP->new( %options ); set This method sets a option value for "option_name". If $option_value is not defined, its option is deleted. $tpp->set( option_name => $option_value ); See OPTIONS section below for details. get This method returns a current option value for "option_name". $tpp->get( 'option_name' ); parse This method reads XML source and returns a hash tree converted. The first argument is a scalar or a reference to a scalar. $tree = $tpp->parse( $source ); parsefile This method reads a XML file and returns a hash tree converted. The first argument is a filename. $tree = $tpp->parsefile( $file ); parsehttp This method receives a XML file from a remote server via HTTP and returns a hash tree converted. $tree = $tpp->parsehttp( $method, $url, $body, $head ); $method is a method of HTTP connection: GET/POST/PUT/DELETE $url is an URI of a XML file. $body is a request body when you use POST method. $head is a request headers as a hash ref. LWP::UserAgent module or HTTP::Lite module is required to fetch a file. ( $tree, $xml, $code ) = $tpp->parsehttp( $method, $url, $body, $head ); In array context, This method returns also raw XML source received and HTTP response's status code. write This method parses a hash tree and returns a XML source generated. $source = $tpp->write( $tree, $encode ); $tree is a reference to a hash tree. writefile This method parses a hash tree and writes a XML source into a file. $tpp->writefile( $file, $tree, $encode ); $file is a filename to create. $tree is a reference to a hash tree. OPTIONS FOR PARSING XML This module accepts option parameters following: force_array This option allows you to specify a list of element names which should always be forced into an array representation. $tpp->set( force_array => [ 'rdf:li', 'item', '-xmlns' ] ); The default value is null, it means that context of the elements will determine to make array or to keep it scalar or hash. Note that the special wildcard name '*' means all elements. force_hash This option allows you to specify a list of element names which should always be forced into an hash representation. $tpp->set( force_hash => [ 'item', 'image' ] ); The default value is null, it means that context of the elements will determine to make hash or to keep it scalar as a text node. See also "text_node_key" option below. Note that the special wildcard name '*' means all elements. cdata_scalar_ref This option allows you to convert a cdata section into a reference for scalar on parsing XML source. $tpp->set( cdata_scalar_ref => 1 ); The default value is false, it means that each cdata section is converted into a scalar. user_agent This option allows you to specify a HTTP_USER_AGENT string which is used by parsehttp() method. $tpp->set( user_agent => 'Mozilla/4.0 (compatible; ...)' ); The default string is 'XML-TreePP/#.##', where '#.##' is substituted with the version number of this library. http_lite This option forces pasrsehttp() method to use a HTTP::Lite instance. my $http = HTTP::Lite->new(); $tpp->set( http_lite => $http ); lwp_useragent This option forces pasrsehttp() method to use a LWP::UserAgent instance. my $ua = LWP::UserAgent->new(); $ua->timeout( 60 ); $ua->env_proxy; $tpp->set( lwp_useragent => $ua ); You may use this with LWP::UserAgent::WithCache. base_class This blesses class name for each element's hashref. Each class is named straight as a child class of it parent class. $tpp->set( base_class => 'MyElement' ); my $xml = 'text'; my $tree = $tpp->parse( $xml ); print ref $tree->{root}->{parent}->{child}, "\n"; A hash for element above is blessed to "MyElement::root::parent::child" class. You may use this with Class::Accessor. elem_class This blesses class name for each element's hashref. Each class is named horizontally under the direct child of "MyElement". $tpp->set( base_class => 'MyElement' ); my $xml = 'text'; my $tree = $tpp->parse( $xml ); print ref $tree->{root}->{parent}->{child}, "\n"; A hash for element above is blessed to "MyElement::child" class. OPTIONS FOR WRITING XML first_out This option allows you to specify a list of element/attribute names which should always appears at first on output XML code. $tpp->set( first_out => [ 'link', 'title', '-type' ] ); The default value is null, it means alphabetical order is used. last_out This option allows you to specify a list of element/attribute names which should always appears at last on output XML code. $tpp->set( last_out => [ 'items', 'item', 'entry' ] ); indent This makes the output more human readable by indenting appropriately. $tpp->set( indent => 2 ); This doesn't strictly follow the XML Document Spec but does looks nice. xml_decl This module generates an XML declaration on writing an XML code per default. This option forces to change or leave it. $tpp->set( xml_decl => '' ); output_encoding This option allows you to specify a encoding of xml file generated by write/writefile methods. $tpp->set( output_encoding => 'UTF-8' ); On Perl 5.8.0 and later, you can select it from every encodings supported by Encode.pm. On Perl 5.6.x and before with Jcode.pm, you can use "Shift_JIS", "EUC-JP", "ISO-2022-JP" and "UTF-8". The default value is "UTF-8" which is recommended encoding. OPTIONS FOR BOTH utf8_flag This makes utf8 flag on for every element's value parsed and makes it on for an XML code generated as well. $tpp->set( utf8_flag => 1 ); Perl 5.8.1 or later is required to use this. attr_prefix This option allows you to specify a prefix character(s) which is inserted before each attribute names. $tpp->set( attr_prefix => '@' ); The default character is '-'. Or set '@' to access attribute values like E4X, ECMAScript for XML. Zero-length prefix '' is available as well, it means no prefix is added. text_node_key This option allows you to specify a hash key for text nodes. $tpp->set( text_node_key => '#text' ); The default key is "#text". ignore_error This module calls Carp::croak function on an error per default. This option makes all errors ignored and just return. $tpp->set( ignore_error => 1 ); use_ixhash This option keeps the order for each element appeared in XML. Tie::IxHash module is required. $tpp->set( use_ixhash => 1 ); This makes parsing performance slow. (about 100% slower than default) AUTHOR Yusuke Kawasaki, http://www.kawa.net/ COPYRIGHT AND LICENSE Copyright (c) 2006-2007 Yusuke Kawasaki. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.