SYNOPSIS use Data::CSel qw(csel); my @cells = csel("Table[name=~/data/i] TCell[value isnt '']:first", $tree); # ditto, but wrap result using a Data::CSel::Selection my $res = csel({wrap=>1}, "Table ...", $tree); # call method 'foo' of each node object (works even when there are zero nodes # in the selection object, or when some nodes do not support the 'foo' method $res->foo; DESCRIPTION This module lets you use a query language (hereby named CSel) that is similar to CSS Selector to select nodes from a tree of objects. EXPRESSION SYNTAX The following is description of the CSel query expression. It is modeled after the CSS Selector syntax with some modification (see "Differences with CSS selector"). An expression is a chain of one or more selectors separated by commas. A selector is a chain of one or more simple selectors separated by combinators. A combinator is either: whitespace (descendant combinator), > (child combinator), ~ (general sibling combinator), or + (adjacent sibling combinator). E F, or two elements combined using descendant combinator, means F element descendant of an E element. E > F means F element child of E element. E ~ F means F element preceded by an E element. E + F means F element immediately preceded by an E element. A simple selector is either a type selector or universal selector followed immediately by zero or more attribute selectors or pseudo-classes, in any order. Type or universal selector is optional if there are at least one attribute selector or pseudo-class. Type selector A type selector is a Perl class/package name. Example: My::Class will match any My::Class object. Universal selector A universal selector is * and matches any class/package. Example: * will match any object. Attribute selector An attribute selector filters objects based on the value of their attributes. The syntax is: [ATTR] [ATTR OP LITERAL] [ATTR] means to only select objects that have an attribute named ATTR, for example: Any[length] means to select object of type (isa()) Any that responds to (can()) length(). Note: to select objects that do not have a specified attribute, you can use the :not pseudo-class (see "Pseudo-class"), for example: Any:not([length]) [ATTR] means to only select objects that have an attribute named ATTR that has value that matches the expression specified by operator OP and operand LITERAL. Literal Literals can either be a number, e.g.: 1 -2.3 4.5e-6 or boolean literals: true false or null (undef) literal: null or a single-quoted string (only recognizes the escape sequences \\ and \'): 'this is a string' 'this isn\'t hard' or a double-quoted string (currently recognizes the escape sequences \\, \", \', \$ [literal $], \t [tab character], \n [newline], \r [linefeed], \f [formfeed], \b [backspace], \a [bell], \e [escape], \0 [null], octal escape e.g. \033, hexadecimal escape e.g. \x1b): "This is a string" "This isn't hard" "Line 1\nLine 2" or a regex string (must be delimited by / ... /, can be followed by zero of more regex modifier characters m, s, i): // /ab(c|d)/i Operators The following are supported operators: * eq String equality using Perl's eq operator. Example: Table[title eq "TOC"] selects all Table objects that have title() with the value of "TOC". * = (or ==) Numerical equality using Perl's == operator. Example: TableCell[length=3] selects all TableCell objects that have length() with the value of 3. To avoid common trap, will switch to using Perl's eq operator when operand does not look like number, e.g.: Table[title = 'foo'] is the same as: Table[title eq 'foo'] * ne String inequality using Perl's ne operator. Example: Table[title ne "TOC"] selects all Table objects that have title() with the value not equal to "TOC". * != Numerical inequality using Perl's != operator. Example: TableCell[length != 3] selects all TableCell objects that have length() with the value not equal to 3. To avoid common trap, will switch to using Perl's ne operator when operand does not look like number, e.g.: Table[title != 'foo'] is the same as: Table[title ne 'foo'] * gt String greater-than using Perl's gt operator. Example: Person[first_name gt "Albert"] selects all Person objects that have first_name() with the value asciibetically greater than "Albert". * > Numerical greater-than using Perl's > operator. Example: TableCell[length > 3] selects all TableCell objects that have length() with the value greater than 3. To avoid common trap, will switch to using Perl's gt operator when operand does not look like number, e.g.: Person[first_name > 'Albert'] is the same as: Person[first_name gt "Albert"] * ge String greater-than-or-equal-to using Perl's ge operator. Example: Person[first_name ge "Albert"] selects all Person objects that have first_name() with the value asciibetically greater than or equal to "Albert". * >= Numerical greater-than-or-equal-to using Perl's >= operator. Example: TableCell[length >= 3] selects all TableCell objects that have length() with the value greater than or equal to 3. To avoid common trap, will switch to using Perl's ge operator when operand does not look like number, e.g.: Person[first_name >= 'Albert'] is the same as: Person[first_name ge "Albert"] * lt String less-than using Perl's lt operator. Example: Person[first_name lt "Albert"] selects all Person objects that have first_name() with the value asciibetically less than "Albert". * < Numerical less-than using Perl's < operator. Example: TableCell[length < 3] selects all TableCell objects that have length() with the value less than 3. To avoid common trap, will switch to using Perl's lt operator when operand does not look like number, e.g.: Person[first_name < 'Albert'] is the same as: Person[first_name lt "Albert"] * le String less-than-or-equal-to using Perl's le operator. Example: Person[first_name le "Albert"] selects all Person objects that have first_name() with the value asciibetically less than or equal to "Albert". * <= Numerical less-than-or-equal-to using Perl's <= operator. Example: TableCell[length <= 3] selects all TableCell objects that have length() with the value less than or equal to 3. To avoid common trap, will switch to using Perl's le operator when operand does not look like number, e.g.: Person[first_name <= 'Albert'] is the same as: Person[first_name le "Albert"] * =~ and !~ Filter only objects where the attribute named attr has the value matching regular expression value. Operand should be a regex literal. Regex literal must be delimited by //. Example: Person[first_name =~ /^Al/] selects all Person objects that have first_name() with the value matching the regex /^Al/. Person[first_name =~ /^al/i] Same as previous example except the regex is case-insensitive. !~ is the opposite of =~, just like in Perl. It checks whether attr has value that does not match regular expression. * is and isnt Testing truth value or definedness. Value can be null or boolean literal. Example: DateTime[is_leap_year is true] will select all DateTime objects where its is_leap_year attribute has a true value. DateTime[is_leap_year is false] will select all DateTime objects where its is_leap_year attribute has a false value. Person[age isnt null] will select all Person objects where age is defined. Pseudo-class A pseudo-class filters objects based on some criteria, in the form of: :NAME :NAME(ARG, ...) Supported pseudo-classes include: * :first Select only the first object from the result set. Example: Person[name =~ /^a/i]:first selects the first person whose name starts with the letter A. * :last Select only the last item from the result set. Example: Person[name =~ /^a/i]:last selects the last person whose name starts with the letter A. * :first-child Select only object that is the first child of its parent. * :last-child Select only object that is the last child of its parent. * :only-child Select only object that is the only child of its parent. * :nth-child(n) Select only object that is the nth child of its parent. * :nth-last-child(n) Select only object that is the nth last child of its parent. * :first-of-type * :last-of-type * :only-of-type * :nth-of-type(n) * :nth-last-of-type(n) * :not(s) * :has(s) Differences with CSS selector Type selector can contain double colon (::) Since Perl package names are separated by ::, CSel allows it in type selector. No equivalent for CSS class and ID selectors I.e.: E.class E#id They are not used in CSel. Syntax of attribute selector is a bit different In CSel, the syntax of attribute selector is made simpler and more regular. There are operators not supported by CSel, but CSel adds more operators from Perl. In particular, the whole substring matching operations like [attr^=val], [attr$=val], [attr*=val], [attr~=val], and [attr|=val] are replaced with the more flexible regex matching instead [attr =~ /re/]. String must always be quoted, e.g.: p[align="middle"] p[align='middle'] instead of just: p[align=middle] Different pseudo-classes supported Some CSS pseudo-classes only make sense for a DOM or a visual browser, e.g. :link, :visited, :hover, so they are not supported. :has(p) and :not(p) needs quoted value. In CSel, p is a regular string literal and must be quoted. There is no concept of CSS namespaces But Perl package names are already hierarchical. FUNCTIONS csel([ \%opts , ] $expr, @tree_nodes) => list|selection_object Select from tree node objects @tree_nodes using CSel expression $expr. Will return a list of mattching node objects (unless when wrap option is true, in which case will return a Data::CSel::Selection object instead). Will die on errors (e.g. syntax error in expression, objects not having the required methods, etc). A tree node object is any regular Perl object satisfying the following criteria: 1) it supports a parent method which should return a single parent node object, or undef if object is the root node); 2) it supports a children method which should return a list (or an arrayref) of children node objects (where the list/array will be empty for a leaf node). Known options: * wrap => bool If set to true, instead of returning a list of matching nodes, the function will return a Data::CSel::Selection object instead (which wraps the result, for convenience). See the selection object's documentation for more details. parse_csel($expr) => hash|undef Parse an expression. On success, will return a hash containing parsed information. On failure, will return undef. SEE ALSO CSS4 Selectors Specification, https://www.w3.org/TR/selectors4/. These modules let you use XPath (or XPath-like) syntax to select nodes of a data structure: Data::DPath. Like CSS selectors, XPath is another query language to select nodes of a document. XPath specification: https://www.w3.org/TR/xpath/. These modules let you use JSONPath syntax to select nodes of a data structure: JSON::Path. JSONPath is a query language to select nodes of a JSON document (data structure). JSONPath specification: http://goessner.net/articles/JsonPath. These modules let you use CSS selector syntax (or its subset) to select nodes of an HTML document: Mojo::DOM (or DOM::Tiny), jQuery, pQuery, HTML::Selector::XPath (or via Web::Query). The last two modules can also handle XPath expression.