← Index
NYTProf Performance Profile   « block view • line view • sub view »
For reply.pl
  Run on Thu Oct 21 22:40:13 2010
Reported on Thu Oct 21 22:44:38 2010

Filename/home/hinrik/perl5/perlbrew/perls/perl-5.13.5/lib/site_perl/5.13.5/Regexp/Common.pm
StatementsExecuted 1022 statements in 5.39ms
Subroutines
Calls P F Exclusive
Time
Inclusive
Time
Subroutine
1919185.79ms24.0msRegexp::Common::::import Regexp::Common::import (recurses: max depth 1, inclusive time 1.49ms)
1616131.28ms1.73msRegexp::Common::::pattern Regexp::Common::pattern
18031221µs221µsRegexp::Common::::CORE:match Regexp::Common::CORE:match (opcode)
1611208µs208µsRegexp::Common::::get_cache Regexp::Common::get_cache
1911132µs132µsRegexp::Common::::TIEHASH Regexp::Common::TIEHASH
7011111µs111µsRegexp::Common::::CORE:regcomp Regexp::Common::CORE:regcomp (opcode)
11147µs47µsRegexp::Common::::BEGIN@3 Regexp::Common::BEGIN@3
331143µs43µsRegexp::Common::::CORE:subst Regexp::Common::CORE:subst (opcode)
11115µs32µsRegexp::Common::::BEGIN@18 Regexp::Common::BEGIN@18
11114µs55µsRegexp::Common::Entry::::BEGIN@257Regexp::Common::Entry::BEGIN@257
11113µs67µsRegexp::Common::::BEGIN@163 Regexp::Common::BEGIN@163
11113µs17µsRegexp::Common::::BEGIN@4 Regexp::Common::BEGIN@4
11112µs119µsRegexp::Common::::BEGIN@19 Regexp::Common::BEGIN@19
11112µs34µsRegexp::Common::::BEGIN@117 Regexp::Common::BEGIN@117
11112µs41µsRegexp::Common::::BEGIN@13 Regexp::Common::BEGIN@13
11111µs33µsRegexp::Common::::BEGIN@60 Regexp::Common::BEGIN@60
11111µs34µsRegexp::Common::::BEGIN@69 Regexp::Common::BEGIN@69
11110µs31µsRegexp::Common::::BEGIN@128 Regexp::Common::BEGIN@128
1117µs7µsRegexp::Common::::BEGIN@6 Regexp::Common::BEGIN@6
1116µs6µsRegexp::Common::::CORE:qr Regexp::Common::CORE:qr (opcode)
0000s0sRegexp::Common::::AUTOLOAD Regexp::Common::AUTOLOAD
0000s0sRegexp::Common::::DESTROY Regexp::Common::DESTROY
0000s0sRegexp::Common::Entry::::__ANON__[:268]Regexp::Common::Entry::__ANON__[:268]
0000s0sRegexp::Common::Entry::::_clone_withRegexp::Common::Entry::_clone_with
0000s0sRegexp::Common::::FETCH Regexp::Common::FETCH
0000s0sRegexp::Common::::__ANON__[:14] Regexp::Common::__ANON__[:14]
0000s0sRegexp::Common::::__ANON__[:188] Regexp::Common::__ANON__[:188]
0000s0sRegexp::Common::::__ANON__[:231] Regexp::Common::__ANON__[:231]
0000s0sRegexp::Common::::_carp Regexp::Common::_carp
0000s0sRegexp::Common::::_croak Regexp::Common::_croak
0000s0sRegexp::Common::::_decache Regexp::Common::_decache
0000s0sRegexp::Common::::croak_version Regexp::Common::croak_version
0000s0sRegexp::Common::::generic_match Regexp::Common::generic_match
0000s0sRegexp::Common::::generic_subs Regexp::Common::generic_subs
0000s0sRegexp::Common::::matches Regexp::Common::matches
0000s0sRegexp::Common::::new Regexp::Common::new
0000s0sRegexp::Common::::subs Regexp::Common::subs
Call graph for these subroutines as a Graphviz dot language file.
Line State
ments
Time
on line
Calls Time
in subs
Code
1package Regexp::Common;
2
3253µs147µs
# spent 47µs within Regexp::Common::BEGIN@3 which was called: # once (47µs+0s) by Hailo::Tokenizer::Words::BEGIN@13 at line 3
use 5.00473;
# spent 47µs making 1 call to Regexp::Common::BEGIN@3
4255µs222µs
# spent 17µs (13+5) within Regexp::Common::BEGIN@4 which was called: # once (13µs+5µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 4
use strict;
# spent 17µs making 1 call to Regexp::Common::BEGIN@4 # spent 5µs making 1 call to strict::import
5
6
# spent 7µs within Regexp::Common::BEGIN@6 which was called: # once (7µs+0s) by Hailo::Tokenizer::Words::BEGIN@13 at line 16
BEGIN {
7 # This makes sure 'use warnings' doesn't bomb out on 5.005_*;
8 # warnings won't be enabled on those old versions though.
9 # Since all other files use this file, we can use 'use warnings'
10 # elsewhere as well, but *AFTER* 'use Regexp::Common'.
1118µs if ($] < 5.006) {
12 $INC {"warnings.pm"} = 1;
13254µs270µs
# spent 41µs (12+29) within Regexp::Common::BEGIN@13 which was called: # once (12µs+29µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 13
no strict 'refs';
# spent 41µs making 1 call to Regexp::Common::BEGIN@13 # spent 29µs making 1 call to strict::unimport
14 *{"warnings::unimport"} = sub {0};
15 }
16120µs17µs}
# spent 7µs making 1 call to Regexp::Common::BEGIN@6
17
18228µs249µs
# spent 32µs (15+17) within Regexp::Common::BEGIN@18 which was called: # once (15µs+17µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 18
use warnings;
# spent 32µs making 1 call to Regexp::Common::BEGIN@18 # spent 17µs making 1 call to warnings::import
192188µs2225µs
# spent 119µs (12+107) within Regexp::Common::BEGIN@19 which was called: # once (12µs+107µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 19
use vars qw /$VERSION %RE %sub_interface $AUTOLOAD/;
# spent 119µs making 1 call to Regexp::Common::BEGIN@19 # spent 106µs making 1 call to vars::import
20
2112µs$VERSION = '2010010201';
22
23
24sub _croak {
25 require Carp;
26 goto &Carp::croak;
27}
28
29sub _carp {
30 require Carp;
31 goto &Carp::carp;
32}
33
34sub new {
35 my ($class, @data) = @_;
36 my %self;
37 tie %self, $class, @data;
38 return \%self;
39}
40
41
# spent 132µs within Regexp::Common::TIEHASH which was called 19 times, avg 7µs/call: # 19 times (132µs+0s) by Regexp::Common::import at line 58, avg 7µs/call
sub TIEHASH {
4238165µs my ($class, @data) = @_;
43 bless \@data, $class;
44}
45
46sub FETCH {
47 my ($self, $extra) = @_;
48 return bless ref($self)->new(@$self, $extra), ref($self);
49}
50
51120µsmy %imports = map {$_ => "Regexp::Common::$_"}
52 qw /balanced CC comment delimited lingua list
53 net number profanity SEN URI whitespace
54 zip/;
55
56
# spent 24.0ms (5.79+18.2) within Regexp::Common::import which was called 19 times, avg 1.26ms/call: # once (4.52ms+19.5ms) by Hailo::Tokenizer::Words::BEGIN@13 at line 13 of Hailo/Tokenizer/Words.pm # once (72µs+-72µs) by Regexp::Common::URI::RFC1035::BEGIN@3 at line 3 of Regexp/Common/URI/RFC1035.pm # once (77µs+-77µs) by Regexp::Common::URI::prospero::BEGIN@3 at line 3 of Regexp/Common/URI/prospero.pm # once (68µs+-68µs) by Regexp::Common::URI::pop::BEGIN@3 at line 3 of Regexp/Common/URI/pop.pm # once (71µs+-71µs) by Regexp::Common::URI::BEGIN@3 at line 3 of Regexp/Common/URI.pm # once (66µs+-66µs) by Regexp::Common::URI::BEGIN@14 at line 14 of Regexp/Common/URI.pm # once (70µs+-70µs) by Regexp::Common::URI::tel::BEGIN@3 at line 3 of Regexp/Common/URI/tel.pm # once (69µs+-69µs) by Regexp::Common::URI::wais::BEGIN@3 at line 3 of Regexp/Common/URI/wais.pm # once (68µs+-68µs) by Regexp::Common::URI::RFC2384::BEGIN@4 at line 4 of Regexp/Common/URI/RFC2384.pm # once (71µs+-71µs) by Regexp::Common::URI::tv::BEGIN@6 at line 6 of Regexp/Common/URI/tv.pm # once (71µs+-71µs) by Regexp::Common::URI::gopher::BEGIN@3 at line 3 of Regexp/Common/URI/gopher.pm # once (71µs+-71µs) by Regexp::Common::URI::file::BEGIN@3 at line 3 of Regexp/Common/URI/file.pm # once (70µs+-70µs) by Regexp::Common::URI::RFC1738::BEGIN@3 at line 3 of Regexp/Common/URI/RFC1738.pm # once (67µs+-67µs) by Regexp::Common::URI::fax::BEGIN@3 at line 3 of Regexp/Common/URI/fax.pm # once (75µs+-75µs) by Regexp::Common::URI::http::BEGIN@3 at line 3 of Regexp/Common/URI/http.pm # once (72µs+-72µs) by Regexp::Common::URI::ftp::BEGIN@3 at line 3 of Regexp/Common/URI/ftp.pm # once (67µs+-67µs) by Regexp::Common::URI::telnet::BEGIN@3 at line 3 of Regexp/Common/URI/telnet.pm # once (71µs+-71µs) by Regexp::Common::URI::RFC2396::BEGIN@3 at line 3 of Regexp/Common/URI/RFC2396.pm # once (69µs+-69µs) by Regexp::Common::URI::news::BEGIN@3 at line 3 of Regexp/Common/URI/news.pm
sub import {
575251.44ms shift; # Shift off the class.
58139µs19132µs tie %RE, __PACKAGE__;
# spent 132µs making 19 calls to Regexp::Common::TIEHASH, avg 7µs/call
59 {
60275µs255µs
# spent 33µs (11+22) within Regexp::Common::BEGIN@60 which was called: # once (11µs+22µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 60
no strict 'refs';
# spent 33µs making 1 call to Regexp::Common::BEGIN@60 # spent 22µs making 1 call to strict::unimport
61 *{caller() . "::RE"} = \%RE;
62 }
63
64 my $saw_import;
65 my $no_defaults;
66 my %exclude;
675568µs foreach my $entry (grep {!/^RE_/} @_) {
# spent 68µs making 55 calls to Regexp::Common::CORE:match, avg 1µs/call
68 if ($entry eq 'pattern') {
692253µs256µs
# spent 34µs (11+22) within Regexp::Common::BEGIN@69 which was called: # once (11µs+22µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 69
no strict 'refs';
# spent 34µs making 1 call to Regexp::Common::BEGIN@69 # spent 22µs making 1 call to strict::unimport
70 *{caller() . "::pattern"} = \&pattern;
71 next;
72 }
73 # This used to prevent $; from being set. We still recognize it,
74 # but we won't do anything.
75 if ($entry eq 'clean') {
76 next;
77 }
78 if ($entry eq 'no_defaults') {
79 $no_defaults ++;
80 next;
81 }
82 if (my $module = $imports {$entry}) {
83 $saw_import ++;
84 eval "require $module;";
# spent 117µs executing statements in string eval
85 die $@ if $@;
86 next;
87 }
88 if ($entry =~ /^!(.*)/ && $imports {$1}) {
89 $exclude {$1} ++;
90 next;
91 }
92 # As a last resort, try to load the argument.
93 my $module = $entry =~ /^Regexp::Common/
94 ? $entry
95 : "Regexp::Common::" . $entry;
96 eval "require $module;";
97 die $@ if $@;
98 }
99
100 unless ($saw_import || $no_defaults) {
101 foreach my $module (values %imports) {
102 next if $exclude {$module};
103 eval "require $module;";
104 die $@ if $@;
105 }
106 }
107
108 my %exported;
1095563µs foreach my $entry (grep {/^RE_/} @_) {
# spent 63µs making 55 calls to Regexp::Common::CORE:match, avg 1µs/call
110 if ($entry =~ /^RE_(\w+_)?ALL$/) {
111 my $m = defined $1 ? $1 : "";
112 my $re = qr /^RE_${m}.*$/;
113 while (my ($sub, $interface) = each %sub_interface) {
114 next if $exported {$sub};
115 next unless $sub =~ /$re/;
116 {
117267µs256µs
# spent 34µs (12+22) within Regexp::Common::BEGIN@117 which was called: # once (12µs+22µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 117
no strict 'refs';
# spent 34µs making 1 call to Regexp::Common::BEGIN@117 # spent 22µs making 1 call to strict::unimport
118 *{caller() . "::$sub"} = $interface;
119 }
120 $exported {$sub} ++;
121 }
122 }
123 else {
124 next if $exported {$entry};
125 _croak "Can't export unknown subroutine &$entry"
126 unless $sub_interface {$entry};
127 {
1282283µs252µs
# spent 31µs (10+21) within Regexp::Common::BEGIN@128 which was called: # once (10µs+21µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 128
no strict 'refs';
# spent 31µs making 1 call to Regexp::Common::BEGIN@128 # spent 21µs making 1 call to strict::unimport
129 *{caller() . "::$entry"} = $sub_interface {$entry};
130 }
131 $exported {$entry} ++;
132 }
133 }
134}
135
136sub AUTOLOAD { _croak "Can't $AUTOLOAD" }
137
138sub DESTROY {}
139
1401600nsmy %cache;
141
142118µs16µsmy $fpat = qr/^(-\w+)/;
# spent 6µs making 1 call to Regexp::Common::CORE:qr
143
144sub _decache {
145 my @args = @{tied %{$_[0]}};
146 my @nonflags = grep {!/$fpat/} @args;
147 my $cache = get_cache(@nonflags);
148 _croak "Can't create unknown regex: \$RE{"
149 . join("}{",@args) . "}"
150 unless exists $cache->{__VAL__};
151 _croak "Perl $] does not support the pattern "
152 . "\$RE{" . join("}{",@args)
153 . "}.\nYou need Perl $cache->{__VAL__}{version} or later"
154 unless ($cache->{__VAL__}{version}||0) <= $];
155 my %flags = ( %{$cache->{__VAL__}{default}},
156 map { /$fpat\Q$;\E(.*)/ ? ($1 => $2)
157 : /$fpat/ ? ($1 => undef)
158 : ()
159 } @args);
160 $cache->{__VAL__}->_clone_with(\@args, \%flags);
161}
162
1632613µs2120µs
# spent 67µs (13+53) within Regexp::Common::BEGIN@163 which was called: # once (13µs+53µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 163
use overload q{""} => \&_decache;
# spent 67µs making 1 call to Regexp::Common::BEGIN@163 # spent 53µs making 1 call to overload::import
164
165
166
# spent 208µs within Regexp::Common::get_cache which was called 16 times, avg 13µs/call: # 16 times (208µs+0s) by Regexp::Common::pattern at line 205, avg 13µs/call
sub get_cache {
16781223µs my $cache = \%cache;
168 foreach (@_) {
169 $cache = $cache->{$_}
170 || ($cache->{$_} = {});
171 }
172 return $cache;
173}
174
175sub croak_version {
176 my ($entry, @args) = @_;
177}
178
179
# spent 1.73ms (1.28+453µs) within Regexp::Common::pattern which was called 16 times, avg 108µs/call: # once (122µs+48µs) by Regexp::Common::import at line 21 of Regexp/Common/URI/fax.pm # once (92µs+37µs) by Regexp::Common::import at line 36 of Regexp/Common/URI/ftp.pm # once (85µs+31µs) by Regexp::Common::import at line 25 of Regexp/Common/URI/tel.pm # once (84µs+31µs) by Regexp::Common::import at line 25 of Regexp/Common/URI/fax.pm # once (90µs+24µs) by Regexp::Common::import at line 19 of Regexp/Common/URI/telnet.pm # once (84µs+28µs) by Regexp::Common::import at line 20 of Regexp/Common/URI/file.pm # once (78µs+30µs) by Regexp::Common::import at line 37 of Regexp/Common/URI/gopher.pm # once (76µs+32µs) by Regexp::Common::import at line 22 of Regexp/Common/URI/pop.pm # once (76µs+29µs) by Regexp::Common::import at line 26 of Regexp/Common/URI/http.pm # once (75µs+26µs) by Regexp::Common::import at line 21 of Regexp/Common/URI/prospero.pm # once (75µs+25µs) by Regexp::Common::import at line 21 of Regexp/Common/URI/tel.pm # once (73µs+26µs) by Regexp::Common::import at line 21 of Regexp/Common/URI/wais.pm # once (73µs+26µs) by Regexp::Common::import at line 25 of Regexp/Common/URI/news.pm # once (72µs+23µs) by Regexp::Common::import at line 22 of Regexp/Common/URI/tv.pm # once (71µs+23µs) by Regexp::Common::import at line 29 of Regexp/Common/URI/news.pm # once (53µs+15µs) by Regexp::Common::import at line 43 of Regexp/Common/URI.pm
sub pattern {
1803511.53ms my %spec = @_;
181 _croak 'pattern() requires argument: name => [ @list ]'
182 unless $spec{name} && ref $spec{name} eq 'ARRAY';
183 _croak 'pattern() requires argument: create => $sub_ref_or_string'
184 unless $spec{create};
185
186 if (ref $spec{create} ne "CODE") {
187 my $fixed_str = "$spec{create}";
188 $spec{create} = sub { $fixed_str }
189 }
190
191 my @nonflags;
192 my %default;
193 foreach ( @{$spec{name}} ) {
194140202µs if (/$fpat=(.*)/) {
# spent 111µs making 70 calls to Regexp::Common::CORE:regcomp, avg 2µs/call # spent 90µs making 70 calls to Regexp::Common::CORE:match, avg 1µs/call
195 $default{$1} = $2;
196 }
197 elsif (/$fpat\s*$/) {
198 $default{$1} = undef;
199 }
200 else {
201 push @nonflags, $_;
202 }
203 }
204
20516208µs my $entry = get_cache(@nonflags);
# spent 208µs making 16 calls to Regexp::Common::get_cache, avg 13µs/call
206
207 if ($entry->{__VAL__}) {
208 _carp "Overriding \$RE{"
209 . join("}{",@nonflags)
210 . "}";
211 }
212
213 $entry->{__VAL__} = bless {
214 create => $spec{create},
215 match => $spec{match} || \&generic_match,
216 subs => $spec{subs} || \&generic_subs,
217 version => $spec{version},
218 default => \%default,
219 }, 'Regexp::Common::Entry';
220
2213343µs foreach (@nonflags) {s/\W/X/g}
# spent 43µs making 33 calls to Regexp::Common::CORE:subst, avg 1µs/call
222 my $subname = "RE_" . join ("_", @nonflags);
223 $sub_interface{$subname} = sub {
224 push @_ => undef if @_ % 2;
225 my %flags = @_;
226 my $pat = $spec{create}->($entry->{__VAL__},
227 {%default, %flags}, \@nonflags);
228 if (exists $flags{-keep}) { $pat =~ s/\Q(?k:/(/g; }
229 else { $pat =~ s/\Q(?k:/(?:/g; }
230 return exists $flags {-i} ? qr /(?i:$pat)/ : qr/$pat/;
231 };
232
233 return 1;
234}
235
236sub generic_match {$_ [1] =~ /$_[0]/}
237sub generic_subs {$_ [1] =~ s/$_[0]/$_[2]/}
238
239sub matches {
240 my ($self, $str) = @_;
241 my $entry = $self -> _decache;
242 $entry -> {match} -> ($entry, $str);
243}
244
245sub subs {
246 my ($self, $str, $newstr) = @_;
247 my $entry = $self -> _decache;
248 $entry -> {subs} -> ($entry, $str, $newstr);
249 return $str;
250}
251
252
253package Regexp::Common::Entry;
254# use Carp;
255
256use overload
257
# spent 55µs (14+42) within Regexp::Common::Entry::BEGIN@257 which was called: # once (14µs+42µs) by Hailo::Tokenizer::Words::BEGIN@13 at line 268
q{""} => sub {
258 my ($self) = @_;
259 my $pat = $self->{create}->($self, $self->{flags}, $self->{args});
260 if (exists $self->{flags}{-keep}) {
261 $pat =~ s/\Q(?k:/(/g;
262 }
263 else {
264 $pat =~ s/\Q(?k:/(?:/g;
265 }
266 if (exists $self->{flags}{-i}) { $pat = "(?i)$pat" }
267 return $pat;
2682256µs297µs };
# spent 55µs making 1 call to Regexp::Common::Entry::BEGIN@257 # spent 42µs making 1 call to overload::import
269
270sub _clone_with {
271 my ($self, $args, $flags) = @_;
272 bless { %$self, args=>$args, flags=>$flags }, ref $self;
273}
274
275
276=pod
277
278=head1 NAME
279
280Regexp::Common - Provide commonly requested regular expressions
281
282=head1 SYNOPSIS
283
284 # STANDARD USAGE
285
286 use Regexp::Common;
287
288 while (<>) {
289 /$RE{num}{real}/ and print q{a number};
290 /$RE{quoted} and print q{a ['"`] quoted string};
291 /$RE{delimited}{-delim=>'/'}/ and print q{a /.../ sequence};
292 /$RE{balanced}{-parens=>'()'}/ and print q{balanced parentheses};
293 /$RE{profanity}/ and print q{a #*@%-ing word};
294 }
295
296
297 # SUBROUTINE-BASED INTERFACE
298
299 use Regexp::Common 'RE_ALL';
300
301 while (<>) {
302 $_ =~ RE_num_real() and print q{a number};
303 $_ =~ RE_quoted() and print q{a ['"`] quoted string};
304 $_ =~ RE_delimited(-delim=>'/') and print q{a /.../ sequence};
305 $_ =~ RE_balanced(-parens=>'()'} and print q{balanced parentheses};
306 $_ =~ RE_profanity() and print q{a #*@%-ing word};
307 }
308
309
310 # IN-LINE MATCHING...
311
312 if ( $RE{num}{int}->matches($text) ) {...}
313
314
315 # ...AND SUBSTITUTION
316
317 my $cropped = $RE{ws}{crop}->subs($uncropped);
318
319
320 # ROLL-YOUR-OWN PATTERNS
321
322 use Regexp::Common 'pattern';
323
324 pattern name => ['name', 'mine'],
325 create => '(?i:J[.]?\s+A[.]?\s+Perl-Hacker)',
326 ;
327
328 my $name_matcher = $RE{name}{mine};
329
330 pattern name => [ 'lineof', '-char=_' ],
331 create => sub {
332 my $flags = shift;
333 my $char = quotemeta $flags->{-char};
334 return '(?:^$char+$)';
335 },
336 matches => sub {
337 my ($self, $str) = @_;
338 return $str !~ /[^$self->{flags}{-char}]/;
339 },
340 subs => sub {
341 my ($self, $str, $replacement) = @_;
342 $_[1] =~ s/^$self->{flags}{-char}+$//g;
343 },
344 ;
345
346 my $asterisks = $RE{lineof}{-char=>'*'};
347
348 # DECIDING WHICH PATTERNS TO LOAD.
349
350 use Regexp::Common qw /comment number/; # Comment and number patterns.
351 use Regexp::Common qw /no_defaults/; # Don't load any patterns.
352 use Regexp::Common qw /!delimited/; # All, but delimited patterns.
353
354
355=head1 DESCRIPTION
356
357By default, this module exports a single hash (C<%RE>) that stores or generates
358commonly needed regular expressions (see L<"List of available patterns">).
359
360There is an alternative, subroutine-based syntax described in
361L<"Subroutine-based interface">.
362
363
364=head2 General syntax for requesting patterns
365
366To access a particular pattern, C<%RE> is treated as a hierarchical hash of
367hashes (of hashes...), with each successive key being an identifier. For
368example, to access the pattern that matches real numbers, you
369specify:
370
371 $RE{num}{real}
372
373and to access the pattern that matches integers:
374
375 $RE{num}{int}
376
377Deeper layers of the hash are used to specify I<flags>: arguments that
378modify the resulting pattern in some way. The keys used to access these
379layers are prefixed with a minus sign and may have a value; if a value
380is given, it's done by using a multidimensional key.
381For example, to access the pattern that
382matches base-2 real numbers with embedded commas separating
383groups of three digits (e.g. 10,101,110.110101101):
384
385 $RE{num}{real}{-base => 2}{-sep => ','}{-group => 3}
386
387Through the magic of Perl, these flag layers may be specified in any order
388(and even interspersed through the identifier keys!)
389so you could get the same pattern with:
390
391 $RE{num}{real}{-sep => ','}{-group => 3}{-base => 2}
392
393or:
394
395 $RE{num}{-base => 2}{real}{-group => 3}{-sep => ','}
396
397or even:
398
399 $RE{-base => 2}{-group => 3}{-sep => ','}{num}{real}
400
401etc.
402
403Note, however, that the relative order of amongst the identifier keys
404I<is> significant. That is:
405
406 $RE{list}{set}
407
408would not be the same as:
409
410 $RE{set}{list}
411
412=head2 Flag syntax
413
414In versions prior to 2.113, flags could also be written as
415C<{"-flag=value"}>. This no longer works, although C<{"-flag$;value"}>
416still does. However, C<< {-flag => 'value'} >> is the preferred syntax.
417
418=head2 Universal flags
419
420Normally, flags are specific to a single pattern.
421However, there is two flags that all patterns may specify.
422
423=over 4
424
425=item C<-keep>
426
427By default, the patterns provided by C<%RE> contain no capturing
428parentheses. However, if the C<-keep> flag is specified (it requires
429no value) then any significant substrings that the pattern matches
430are captured. For example:
431
432 if ($str =~ $RE{num}{real}{-keep}) {
433 $number = $1;
434 $whole = $3;
435 $decimals = $5;
436 }
437
438Special care is needed if a "kept" pattern is interpolated into a
439larger regular expression, as the presence of other capturing
440parentheses is likely to change the "number variables" into which significant
441substrings are saved.
442
443See also L<"Adding new regular expressions">, which describes how to create
444new patterns with "optional" capturing brackets that respond to C<-keep>.
445
446=item C<-i>
447
448Some patterns or subpatterns only match lowercase or uppercase letters.
449If one wants the do case insensitive matching, one option is to use
450the C</i> regexp modifier, or the special sequence C<(?i)>. But if the
451functional interface is used, one does not have this option. The
452C<-i> switch solves this problem; by using it, the pattern will do
453case insensitive matching.
454
455=back
456
457=head2 OO interface and inline matching/substitution
458
459The patterns returned from C<%RE> are objects, so rather than writing:
460
461 if ($str =~ /$RE{some}{pattern}/ ) {...}
462
463you can write:
464
465 if ( $RE{some}{pattern}->matches($str) ) {...}
466
467For matching this would seem to have no great advantage apart from readability
468(but see below).
469
470For substitutions, it has other significant benefits. Frequently you want to
471perform a substitution on a string without changing the original. Most people
472use this:
473
474 $changed = $original;
475 $changed =~ s/$RE{some}{pattern}/$replacement/;
476
477The more adept use:
478
479 ($changed = $original) =~ s/$RE{some}{pattern}/$replacement/;
480
481Regexp::Common allows you do write this:
482
483 $changed = $RE{some}{pattern}->subs($original=>$replacement);
484
485Apart from reducing precedence-angst, this approach has the added
486advantages that the substitution behaviour can be optimized from the
487regular expression, and the replacement string can be provided by
488default (see L<"Adding new regular expressions">).
489
490For example, in the implementation of this substitution:
491
492 $cropped = $RE{ws}{crop}->subs($uncropped);
493
494the default empty string is provided automatically, and the substitution is
495optimized to use:
496
497 $uncropped =~ s/^\s+//;
498 $uncropped =~ s/\s+$//;
499
500rather than:
501
502 $uncropped =~ s/^\s+|\s+$//g;
503
504
505=head2 Subroutine-based interface
506
507The hash-based interface was chosen because it allows regexes to be
508effortlessly interpolated, and because it also allows them to be
509"curried". For example:
510
511 my $num = $RE{num}{int};
512
513 my $commad = $num->{-sep=>','}{-group=>3};
514 my $duodecimal = $num->{-base=>12};
515
516
517However, the use of tied hashes does make the access to Regexp::Common
518patterns slower than it might otherwise be. In contexts where impatience
519overrules laziness, Regexp::Common provides an additional
520subroutine-based interface.
521
522For each (sub-)entry in the C<%RE> hash (C<$RE{key1}{key2}{etc}>), there
523is a corresponding exportable subroutine: C<RE_key1_key2_etc()>. The name of
524each subroutine is the underscore-separated concatenation of the I<non-flag>
525keys that locate the same pattern in C<%RE>. Flags are passed to the subroutine
526in its argument list. Thus:
527
528 use Regexp::Common qw( RE_ws_crop RE_num_real RE_profanity );
529
530 $str =~ RE_ws_crop() and die "Surrounded by whitespace";
531
532 $str =~ RE_num_real(-base=>8, -sep=>" ") or next;
533
534 $offensive = RE_profanity(-keep);
535 $str =~ s/$offensive/$bad{$1}++; "<expletive deleted>"/ge;
536
537Note that, unlike the hash-based interface (which returns objects), these
538subroutines return ordinary C<qr>'d regular expressions. Hence they do not
539curry, nor do they provide the OO match and substitution inlining described
540in the previous section.
541
542It is also possible to export subroutines for all available patterns like so:
543
544 use Regexp::Common 'RE_ALL';
545
546Or you can export all subroutines with a common prefix of keys like so:
547
548 use Regexp::Common 'RE_num_ALL';
549
550which will export C<RE_num_int> and C<RE_num_real> (and if you have
551create more patterns who have first key I<num>, those will be exported
552as well). In general, I<RE_key1_..._keyn_ALL> will export all subroutines
553whose pattern names have first keys I<key1> ... I<keyn>.
554
555
556=head2 Adding new regular expressions
557
558You can add your own regular expressions to the C<%RE> hash at run-time,
559using the exportable C<pattern> subroutine. It expects a hash-like list of
560key/value pairs that specify the behaviour of the pattern. The various
561possible argument pairs are:
562
563=over 4
564
565=item C<name =E<gt> [ @list ]>
566
567A required argument that specifies the name of the pattern, and any
568flags it may take, via a reference to a list of strings. For example:
569
570 pattern name => [qw( line of -char )],
571 # other args here
572 ;
573
574This specifies an entry C<$RE{line}{of}>, which may take a C<-char> flag.
575
576Flags may also be specified with a default value, which is then used whenever
577the flag is specified without an explicit value (but not when the flag is
578omitted). For example:
579
580 pattern name => [qw( line of -char=_ )],
581 # default char is '_'
582 # other args here
583 ;
584
585
586=item C<create =E<gt> $sub_ref_or_string>
587
588A required argument that specifies either a string that is to be returned
589as the pattern:
590
591 pattern name => [qw( line of underscores )],
592 create => q/(?:^_+$)/
593 ;
594
595or a reference to a subroutine that will be called to create the pattern:
596
597 pattern name => [qw( line of -char=_ )],
598 create => sub {
599 my ($self, $flags) = @_;
600 my $char = quotemeta $flags->{-char};
601 return '(?:^$char+$)';
602 },
603 ;
604
605If the subroutine version is used, the subroutine will be called with
606three arguments: a reference to the pattern object itself, a reference
607to a hash containing the flags and their values,
608and a reference to an array containing the non-flag keys.
609
610Whatever the subroutine returns is stringified as the pattern.
611
612No matter how the pattern is created, it is immediately postprocessed to
613include or exclude capturing parentheses (according to the value of the
614C<-keep> flag). To specify such "optional" capturing parentheses within
615the regular expression associated with C<create>, use the notation
616C<(?k:...)>. Any parentheses of this type will be converted to C<(...)>
617when the C<-keep> flag is specified, or C<(?:...)> when it is not.
618It is a Regexp::Common convention that the outermost capturing parentheses
619always capture the entire pattern, but this is not enforced.
620
621
622=item C<matches =E<gt> $sub_ref>
623
624An optional argument that specifies a subroutine that is to be called when
625the C<$RE{...}-E<gt>matches(...)> method of this pattern is invoked.
626
627The subroutine should expect two arguments: a reference to the pattern object
628itself, and the string to be matched against.
629
630It should return the same types of values as a C<m/.../> does.
631
632 pattern name => [qw( line of -char )],
633 create => sub {...},
634 matches => sub {
635 my ($self, $str) = @_;
636 $str !~ /[^$self->{flags}{-char}]/;
637 },
638 ;
639
640
641=item C<subs =E<gt> $sub_ref>
642
643An optional argument that specifies a subroutine that is to be called when
644the C<$RE{...}-E<gt>subs(...)> method of this pattern is invoked.
645
646The subroutine should expect three arguments: a reference to the pattern object
647itself, the string to be changed, and the value to be substituted into it.
648The third argument may be C<undef>, indicating the default substitution is
649required.
650
651The subroutine should return the same types of values as an C<s/.../.../> does.
652
653For example:
654
655 pattern name => [ 'lineof', '-char=_' ],
656 create => sub {...},
657 subs => sub {
658 my ($self, $str, $ignore_replacement) = @_;
659 $_[1] =~ s/^$self->{flags}{-char}+$//g;
660 },
661 ;
662
663Note that such a subroutine will almost always need to modify C<$_[1]> directly.
664
665
666=item C<version =E<gt> $minimum_perl_version>
667
668If this argument is given, it specifies the minimum version of perl required
669to use the new pattern. Attempts to use the pattern with earlier versions of
670perl will generate a fatal diagnostic.
671
672=back
673
674=head2 Loading specific sets of patterns.
675
676By default, all the sets of patterns listed below are made available.
677However, it is possible to indicate which sets of patterns should
678be made available - the wanted sets should be given as arguments to
679C<use>. Alternatively, it is also possible to indicate which sets of
680patterns should not be made available - those sets will be given as
681argument to the C<use> statement, but are preceeded with an exclaimation
682mark. The argument I<no_defaults> indicates none of the default patterns
683should be made available. This is useful for instance if all you want
684is the C<pattern()> subroutine.
685
686Examples:
687
688 use Regexp::Common qw /comment number/; # Comment and number patterns.
689 use Regexp::Common qw /no_defaults/; # Don't load any patterns.
690 use Regexp::Common qw /!delimited/; # All, but delimited patterns.
691
692It's also possible to load your own set of patterns. If you have a
693module C<Regexp::Common::my_patterns> that makes patterns available,
694you can have it made available with
695
696 use Regexp::Common qw /my_patterns/;
697
698Note that the default patterns will still be made available - only if
699you use I<no_defaults>, or mention one of the default sets explicitely,
700the non mentioned defaults aren't made available.
701
702=head2 List of available patterns
703
704The patterns listed below are currently available. Each set of patterns
705has its own manual page describing the details. For each pattern set
706named I<name>, the manual page I<Regexp::Common::name> describes the
707details.
708
709Currently available are:
710
711=over 4
712
713=item Regexp::Common::balanced
714
715Provides regexes for strings with balanced parenthesized delimiters.
716
717=item Regexp::Common::comment
718
719Provides regexes for comments of various languages (43 languages
720currently).
721
722=item Regexp::Common::delimited
723
724Provides regexes for delimited strings.
725
726=item Regexp::Common::lingua
727
728Provides regexes for palindromes.
729
730=item Regexp::Common::list
731
732Provides regexes for lists.
733
734=item Regexp::Common::net
735
736Provides regexes for IPv4 addresses and MAC addresses.
737
738=item Regexp::Common::number
739
740Provides regexes for numbers (integers and reals).
741
742=item Regexp::Common::profanity
743
744Provides regexes for profanity.
745
746=item Regexp::Common::whitespace
747
748Provides regexes for leading and trailing whitespace.
749
750=item Regexp::Common::zip
751
752Provides regexes for zip codes.
753
754=back
755
756=head2 Forthcoming patterns and features
757
758Future releases of the module will also provide patterns for the following:
759
760 * email addresses
761 * HTML/XML tags
762 * more numerical matchers,
763 * mail headers (including multiline ones),
764 * more URLS
765 * telephone numbers of various countries
766 * currency (universal 3 letter format, Latin-1, currency names)
767 * dates
768 * binary formats (e.g. UUencoded, MIMEd)
769
770If you have other patterns or pattern generators that you think would be
771generally useful, please send them to the maintainer -- preferably as source
772code using the C<pattern> subroutine. Submissions that include a set of
773tests will be especially welcome.
774
775
776=head1 DIAGNOSTICS
777
778=over 4
779
780=item C<Can't export unknown subroutine %s>
781
782The subroutine-based interface didn't recognize the requested subroutine.
783Often caused by a spelling mistake or an incompletely specified name.
784
785
786=item C<Can't create unknown regex: $RE{...}>
787
788Regexp::Common doesn't have a generator for the requested pattern.
789Often indicates a mispelt or missing parameter.
790
791=item
792C<Perl %f does not support the pattern $RE{...}.
793You need Perl %f or later>
794
795The requested pattern requires advanced regex features (e.g. recursion)
796that not available in your version of Perl. Time to upgrade.
797
798=item C<< pattern() requires argument: name => [ @list ] >>
799
800Every user-defined pattern specification must have a name.
801
802=item C<< pattern() requires argument: create => $sub_ref_or_string >>
803
804Every user-defined pattern specification must provide a pattern creation
805mechanism: either a pattern string or a reference to a subroutine that
806returns the pattern string.
807
808=item C<Base must be between 1 and 36>
809
810The C<< $RE{num}{real}{-base=>'I<N>'} >> pattern uses the characters [0-9A-Z]
811to represent the digits of various bases. Hence it only produces
812regular expressions for bases up to hexatricensimal.
813
814=item C<Must specify delimiter in $RE{delimited}>
815
816The pattern has no default delimiter.
817You need to write: C<< $RE{delimited}{-delim=>I<X>'} >> for some character I<X>
818
819=back
820
821=head1 ACKNOWLEDGEMENTS
822
823Deepest thanks to the many people who have encouraged and contributed to this
824project, especially: Elijah, Jarkko, Tom, Nat, Ed, and Vivek.
825
826Further thanks go to: Alexandr Ciornii, Blair Zajac, Bob Stockdale,
827Charles Thomas, Chris Vertonghen, the CPAN Testers, David Hand,
828Fany, Geoffrey Leach, Hermann-Marcus Behrens, Jerome Quelin, Jim Cromie,
829Lars Wilke, Linda Julien, Mike Arms, Mike Castle, Mikko, Murat Uenalan,
830RafaE<235>l Garcia-Suarez, Ron Savage, Sam Vilain, Slaven Rezic, Smylers,
831Tim Maher, and all the others I've forgotten.
832
833=head1 AUTHOR
834
835Damian Conway (damian@conway.org)
836
837=head1 MAINTAINANCE
838
839This package is maintained by Abigail S<(I<regexp-common@abigail.be>)>.
840
841=head1 BUGS AND IRRITATIONS
842
843Bound to be plenty.
844
845For a start, there are many common regexes missing.
846Send them in to I<regexp-common@abigail.be>.
847
848There are some POD issues when installing this module using a pre-5.6.0 perl;
849some manual pages may not install, or may not install correctly using a perl
850that is that old. You might consider upgrading your perl.
851
852=head1 LICENSE and COPYRIGHT
853
854This software is Copyright (c) 2001 - 2009, Damian Conway and Abigail.
855
856This module is free software, and maybe used under any of the following
857licenses:
858
859 1) The Perl Artistic License. See the file COPYRIGHT.AL.
860 2) The Perl Artistic License 2.0. See the file COPYRIGHT.AL2.
861 3) The BSD Licence. See the file COPYRIGHT.BSD.
862 4) The MIT Licence. See the file COPYRIGHT.MIT.
 
# spent 221µs within Regexp::Common::CORE:match which was called 180 times, avg 1µs/call: # 70 times (90µs+0s) by Regexp::Common::pattern at line 194, avg 1µs/call # 55 times (68µs+0s) by Regexp::Common::import at line 67, avg 1µs/call # 55 times (63µs+0s) by Regexp::Common::import at line 109, avg 1µs/call
sub Regexp::Common::CORE:match; # opcode
# spent 6µs within Regexp::Common::CORE:qr which was called: # once (6µs+0s) by Hailo::Tokenizer::Words::BEGIN@13 at line 142
sub Regexp::Common::CORE:qr; # opcode
# spent 111µs within Regexp::Common::CORE:regcomp which was called 70 times, avg 2µs/call: # 70 times (111µs+0s) by Regexp::Common::pattern at line 194, avg 2µs/call
sub Regexp::Common::CORE:regcomp; # opcode
# spent 43µs within Regexp::Common::CORE:subst which was called 33 times, avg 1µs/call: # 33 times (43µs+0s) by Regexp::Common::pattern at line 221, avg 1µs/call
sub Regexp::Common::CORE:subst; # opcode