HTML-Miner NAME HTML::Miner - This Module 'Mines' (hopefully) useful information for an URL or HTML snippet. VERSION Version 0.03 SYNOPSIS HTML::Miner 'Mines' (hopefully) useful information for an URL or HTML snippet. The following is a list of HTML elements that can be extracted: Find all links and for each link extract: URL Title URL href URL Anchor Text URL Domain URL Protocol URL URI URL Absolute location Find all images and for each image extract: IMG Source URL IMG Absolute Source URL IMG Source Domain Extracts Meta Elements such as Page Title Page Description Page Keywords Page RSS Feeds Finds the final destination URL of a potentially redirecting URL. Example ( Object Oriented Usage ) use HTML::Miner; my $html = "some html"; # or $html = do{local $/;}; with __DATA__ provided my $foo = HTML::Miner->new ( CURRENT_URL => '' , CURRENT_URL_HTML => $html ); my $meta_data = $html_miner->get_meta_elements() ; my $links = $html_miner->get_links(); my $images = $html_miner->get_images(); my ( $clear_url, $protocol, $domain, $uri ) = $html_miner->break_url(); my $out = HTML::Miner::get_redirect_destination( "redirectingurl_here.html" ) ; my $out = HTML::Miner::get_absolute_url( "", "../../about/" ); Example ( Direct access of Methods ) use HTML::Miner; my $html = "some html"; # or $html = do{local $/;}; with __DATA__ provided my $url = ""; my $meta_data = HTML::Miner::get_meta_elements( $url, $html ) ; my $links = HTML::Miner::get_links( $url, $html ); my $images = HTML::Miner::get_images( $url, $html ); my ( $clear_url, $protocol, $domain, $uri ) = HTML::Minerbreak_url( $url ); my $out = HTML::Miner::get_redirect_destination( "redirectingurl_here.html" ) ; my $out = HTML::Miner::get_absolute_url( "", "../../about/" ); Testing HTML __DATA__ SiteTitle Link1 Link2 Link3 image2 link3 link3 Example Outputs my $meta_data = $html_miner->get_meta_elements() ; # $meta_data->{ TITLE } => "SiteTitle" # $meta_data->{ DESC } => "desc of site" # $meta_data->{ KEYWORDS }->[0] => "kw1" # $meta_data->{ RSS }->[0]->{TYPE} => "application/atom+xml" my $links = $html_miner->get_links(); # $links->[0]->{ DOMAIN } => "" # $links->[0]->{ ANCHOR } => "Link1" # $links->[2]->{ ABS_URL } => "" # $links->[1]->{ DOMAIN_IS_BASE } => 1 # $links->[1]->{ TITLE } => "title2" my $images = $html_miner->get_images(); # $images->[0]->{ IMG_LOC } => "" # $images->[2]->{ ALT } => "link3" # $images->[0]->{ IMG_DOMAIN } => "" # $images->[3]->{ ABS_LOC } => "" my ( $clear_url, $protocol, $domain, $uri ) = $html_miner->break_url(); # $clear_url => "" # $protocol => "http" # $domain => "" # $uri => "/" HTML::Miner::get_redirect_destination( "redirectingurl_here.html" ) => 'redirected_to' my $out = HTML::Miner::get_absolute_url( "", "../../about/" ); # $out => "" $out = HTML::Miner::get_absolute_url( "", "index2.html" ); # $out => "" $out = HTML::Miner::get_absolute_url( "", "../../index.html" ); # $out => "" $out = HTML::Miner::get_absolute_url( "", "/about/" ); # $out => "" $out = HTML::Miner::get_absolute_url( "www.perl.comhelp/faq/", "" ); # $out => "" EXPORT This Module does not export anything through @EXPORT, however does export the following function through @EXPORT_OK get_links get_absolute_url break_url get_redirect_destination get_images get_meta_elements INSTALLATION To install this module, run the following commands: perl Makefile.PL make make test make install SUPPORT AND DOCUMENTATION After installing, you can find documentation for this module with the perldoc command. perldoc HTML::Miner You can also look for information at: RT, CPAN's request tracker AnnoCPAN, Annotated CPAN documentation CPAN Ratings Search CPAN COPYRIGHT AND LICENCE Copyright (C) 2009 4am Design and Technology Labs Pvt. Ltd., all rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.