NAME
"String::Tagged" - string buffers with value tags on extents
SYNOPSIS
use String::Tagged;
my $st = String::Tagged->new( "An important message" );
$st->apply_tag( 3, 9, bold => 1 );
$st->iter_substr_nooverlap(
sub {
my ( $substring, %tags ) = @_;
print $tags{bold} ? "$substring"
: $substring;
}
);
DESCRIPTION
This module implements an object class, instances of which store a
(mutable) string buffer that supports tags. A tag is a name/value pair
that applies to some non-empty extent of the underlying string.
The types of tag names ought to be strings, or at least values that are
well-behaved as strings, as the names will often be used as the keys in
hashes or applied to the "eq" operator.
The types of tag values are not restricted - any scalar will do. This
could be a simple integer or string, ARRAY or HASH reference, or even a
CODE reference containing an event handler of some kind.
Tags may be arbitrarily overlapped. Any given offset within the string
has in effect, a set of uniquely named tags. Tags of different names are
independent. For tags of the same name, only the the latest, shortest
tag takes effect.
For example, consider a string with three tags represented here:
Here is my string with tags
[-------------------------] foo => 1
[-------] foo => 2
[---] bar => 3
Every character in this string has a tag named "foo". The value of this
tag is 2 for the words "my" and "string" and the space inbetween, and 1
elsewhere. Additionally, the words "is" and "my" and the space between
them also have the tag "bar" with a value 3.
Since "String::Tagged" does not understand the significance of the tag
values it therefore cannot detect if two neighbouring tags really
contain the same semantic idea. Consider the following string:
A string with words
[-------] type => "message"
[--------] type => "message"
This string contains two tags. "String::Tagged" will treat this as two
different tag values as far as "iter_tags_nooverlap()" is concerned,
even though "get_tag_at()" yields the same value for the "type" tag at
any position in the string.
NAMING
I spent a lot of time considering the name for this module. It seems
that a number of people across a number of languages all created similar
functionallity, though named very differently. For the benefit of
keyword-based search tools and similar, here's a list of some other
names this sort of object might be known by:
* Extents
* Overlays
* Attribute or attributed strings
* Markup
* Out-of-band data
CONSTRUCTOR
$st = String::Tagged->new( $str )
Returns a new instance of a "String::Tagged" object. It will contain no
tags. If the optional $str argument is supplied, the string buffer will
be initialised from this value.
If $str is a "String::Tagged" object then its tags will be copied too;
i.e. the constructor also works as a clone method.
METHODS
$str = $st->str
"$st"
Returns the plain string contained within the object.
This method is also called for stringification; so the "String::Tagged"
object can be used in a plain string interpolation such as
my $message = String::Tagged->new( "Hello world" );
print "My message is $message\n";
$len = $st->length
$len = length( $st )
Returns the length of the plain string. Because stringification works on
this object class, the normal core "length()" function works correctly
on it.
$str = $st->substr( $start, $len )
Returns a substring of the plain string contained within the object.
$st->apply_tag( $start, $len, $name, $value )
Apply the named tag value to the given extent. The tag will start on the
character at the $start index, and continue for the next $len
characters.
If $start is given as -1, the tag will be considered to start "before"
the actual string. If $len is given as -1, the tag will be considered to
end "after" end of the actual string. These special limits are used by
"set_substr()" when deciding whether to move a tag boundary. The start
of any tag that starts "before" the string is never moved, even if more
text is inserted at the beginning. Similarly, a tag which ends "after"
the end of the string, will continue to the end even if more text is
appended.
$st->unapply_tag( $start, $len, $name )
Unapply the named tag value from the given extent. If the tag extends
beyond this extent, then any partial fragment of the tag will be left in
the string.
$st->delete_tag( $start, $len, $name )
Delete the named tag within the given extent. Entire tags are removed,
even if they extend beyond this extent.
$st->merge_tags( $eqsub )
Merge neighbouring or overlapping tags of the same name and equal
values.
For each pair of tags of the same name that apply on neighbouring or
overlapping extents, the $eqsub callback is called, as
$equal = $eqsub->( $name, $value_a, $value_b )
If this function returns true then the tags are merged.
The equallity test function is free to perform any comparison of the
values that may be relevant to the application; for example it may
deeply compare referred structures and check for equivalence in some
application-defined manner. In this case, the first tag of a pair is
retained, the second is deleted. This may be relevant if the tag value
is a reference to some object.
$st->iter_extents( $callback, %opts )
Iterate the tags stored in the string. For each tag, the CODE reference
in $callback is invoked once, being passed an extent object that
represents the extent of the tag. These extents will have the
"anchor_before" and "anchor_after" flags defined.
$callback->( $extent, $tagname, $tagvalue )
Options passed in %opts may include:
start => INT
Start at the given position; defaults to 0.
end => INT
End after the given position; defaults to end of string. This option
overrides "len".
len => INT
End after the given length beyond the start position; defaults to
end of string. This option only applies if "end" is not given.
$st->iter_tags( $callback, %opts )
Iterate the tags stored in the string. For each tag, the CODE reference
in $callback is invoked once, being passed the start point and length of
the tag.
$callback->( $start, $length, $tagname, $tagvalue )
Options passed in %opts are the same as for "iter_extents()".
$st->iter_extents_nooverlap( $callback, %opts )
Iterate non-overlapping extents of tags stored in the string. The CODE
reference in $callback is invoked for each extent in the string where no
tags change. The entire set of tags active in that extent is given to
the callback. Because the extent covers possibly-multiple tags, it will
not define the "anchor_before" and "anchor_after" flags.
$callback->( $extent, %tags )
The callback will be invoked over the entire length of the string,
including any extents with no tags applied.
Options may be passed in %opts to control the range of the string
iterated over, in the same way as the "iter_extents()" method.
$st->iter_tags_nooverlap( $callback, %opts )
Iterate extents of the string using "iter_extents_nooverlap()", but
passing the start and length of each extent to the callback instead of
the extent object.
$callback->( $start, $length, %tags )
Options may be passed in %opts to control the range of the string
iterated over, in the same way as the "iter_extents()" method.
$st->iter_substr_nooverlap( $callback, %opts )
Iterate extents of the string using "iter_extents_nooverlap()", but
passing the substring of data instead of the extent object.
$callback->( $substr, %tags )
Options may be passed in %opts to control the range of the string
iterated over, in the same way as the "iter_extents()" method.
@names = $st->tagnames
Returns the set of tag names used in the string, in no particular order.
$tags = $st->get_tags_at( $pos )
Returns a HASH reference of all the tag values active at the given
position.
$value = $st->get_tag_at( $pos, $name )
Returns the value of the named tag at the given position, or "undef" if
the tag is not applied there.
$extent = $st->get_tag_extent( $pos, $name )
If the named tag applies to the given position, returns the extent of
the tag at that position. If it does not, "undef" is returned. If an
extent is returned it will define the "anchor_before" and "anchor_after"
flags.
$extent = $st->get_tag_missing_extent( $pos, $name )
If the named tag does not apply at the given position, returns the
extent of the string around that position that does not have the tag. If
it does exist, "undef" is returned. If an extent is returned it will not
define the "anchor_before" and "anchor_after" flags, as these do not
make sense for the range in which a tag is absent.
$st->set_substr( $start, $len, $newstr )
Modifies a extent of the underlying plain string to that given. The
extent of tags in the string are adjusted to cope with the modified
region, and the adjustment in length.
Tags entirely before the replaced extent remain unchanged.
Tags entirely within the replaced extent are deleted.
Tags entirely after the replaced extent are moved by appropriate amount
to ensure they still apply to the same characters as before.
Tags that start before and end after the extent remain, and have their
lengths suitably adjusted.
Tags that span just the start or end of the extent, but not both, are
truncated, so as to remove the part of the tag applied on the modified
extent but preserving that applied outside.
If $newstr is a "String::Tagged" object, then its tags will be applied
to $st as appropriate. Edge-anchored tags in $newstr will not be
extended through $st, though they will apply as edge-anchored if they
now sit at the edge of the new string.
$st->insert( $start, $newstr )
Insert the given string at the given position. A shortcut around
"set_substr()".
If $newstr is a "String::Tagged" object, then its tags will be applied
to $st as appropriate. If $start is 0, any before-anchored tags in will
become before-anchored in $st.
$st->append( $newstr )
$st .= $newstr
Append to the underlying plain string. A shortcut around "set_substr()".
If $newstr is a "String::Tagged" object, then its tags will be applied
to $st as appropriate. Any after-anchored tags in will become
after-anchored in $st.
$st->append_tagged( $newstr, %tags )
Append to the underlying plain string, and apply the given tags to the
newly-inserted extent.
$ret = $st->concat( $other )
$ret = $st . $other
Returns a new "String::Tagged" containing the two strings concatenated
together, preserving any tags present. This method overloads normal
string concatenation operator, so expressions involving "String::Tagged"
values retain their tags.
$ret = $st->debug_sprintf
Returns a representation of the string data and all the tags, suitable
for debug printing or other similar use. This is a format such as is
given in the DESCRIPTION section above.
The output will consist of a number of lines, the first containing the
plain underlying string, then one line per tag. The line shows the
extent of the tag given by "[---]" markers, or a "|" in the special case
of a tag covering only a single character. Special markings of "<" and
">" indicate tags which are "before" or "after" anchored.
For example:
Hello, world
[---] word => 1
<[----------]> everywhere => 1
| space => 1
Extent Objects
These objects represent a range of characters within the containing
"String::Tagged" object. The range they represent is fixed at the time
of creation. If the containing string is modified by a call to
"set_substr()" then the effect on the extent object is not defined.
These objects should be considered as relatively short-lived - used
briefly for the purpose of querying the result of an operation, then
discarded soon after.
$extent->string
Returns the containing "String::Tagged" object.
$extent->start
Returns the start index of the extent. This is the index of the first
character within the extent.
$extent->end
Returns the end index of the extent. This is the index of the first
character beyond the end of the extent.
$extent->anchor_before
True if this extent begins "before" the start of the string. Only
certain methods return extents with this flag defined.
$extent->anchor_after
True if this extent ends "after" the end of the string. Only certain
methods return extents with this flag defined.
$extent->length
Returns the number of characters within the extent.
$extent->substr
Returns the substring of the underlying plain string buffer contained by
the extent.
TODO
* There are likely variations on the rules for "set_substr()" that
could equally apply to some uses of tagged strings. Consider whether
the behaviour of modification is chosen per-method, per-tag, or
per-string.
AUTHOR
Paul Evans