Feature Structures Introduction

A feature structure is a general purpose data structure, which identifies and groups together individual features, each of which associates a name with one or more values. Because of the generality of feature structures, they can be used to represent many different kinds of information, and interrelations among various pieces of information, and their instantiation in SGML in these guidelines provides a metalanguage for representing text analysis and interpretation. Moreover, this instantiation allows feature values to be of various types, and for restrictions to be placed on the values for particular features, by means of feature system declarations, which are discussed in chapter . These restrictions provide the basis for at least partial validation of the feature-structure encodings that are used.

This chapter is organized as follows. Following this introduction, section introduces the binary feature values, and shows how elementary feature structures using features with those values may be constructed. Section introduces the tags that represent libraries of features, feature structures and feature values, along with methods for pointing at features, feature structures and feature values in these libraries. Section , presents the tags for symbolic, numeric, measurement, rate, and string values. Section , shows how to use feature-structures themselves as values, thus enabling feature structures to be recursively defined. Section demonstrates the use of multiple values for features, for encoding set, bag, and list collections of values. Section presents various methods for representing alternations (disjunctions) of features and feature values. Section , presents tags for boolean, default, and uncertain values, along with methods for underspecifying feature values. Section shows how to specify various logical relations, such as negation and subsumption, between the expressed values for a feature and its actual values. Finally, section , illustrates how feature structures may be linked to to text elements.

This tag set is selected as described in ; in a document which uses the markup described in this chapter, the document type declaration should contain the following declaration of the entity TEI.fs, or an equivalent one: ]]> The entire document type declaration for a document using this additional tag set together with the base tag set for prose might look like this: ]> ]]>

The overall document type declaration for this additional tag set has the following structure: ]]> Elementary Feature Structures: Features with Binary Values

The fundamental elements of a feature structure system are f (for feature) and fs (for feature structure). The fs element has a type attribute for indicating what type of feature structure it represents, and may contain any number of f elements. An f element, in turn, has a required name attribute and any number of associated values. Feature values may be binary, numeric, symbolic (i.e. taken from a restricted set of legal values), or string-valued, or may consist of sets, lists, or bags of binary, numeric, symbolic, or string values. Specialized values may also be given which allow partial underspecification of the feature. These possible types of values are described in more detail in this and the following sections.

This section considers the special case of feature structures that contain features whose single value is one of the binary values represented by the empty elements plus and minus. The elements which are used for representing feature structures, features and the binary values, along with their descriptions and attributes, are the following. analyzes a collection of features and feature alternations as a structural unit. Attributes include: provides a type for a feature structure. pointer to features. indicates the relation of the given content to the actual content or value of the feature structure. Legal values are: indicates that the actual content is that given. indicates that the actual content is not that given. indicates that the actual content is subsumed by the given content. indicates that the actual content is not subsumed by the given content. associates a name with a value of any of several different types. Attributes include: provides a name for a feature. indicates organization of given value or values as singleton, set, bag or list. Legal values are: indicates that the given value is a singleton. indicates that the given values are organized as a set. indicates that the given values are organized as a bag (multiset). indicates that the given values are organized as a list. points to the id attributes of feature values. indicates the relation between the values that are given as the content of the feature or pointed at by the fVal attribute and the actual values of the feature. Legal values are: indicates that the given values are the actual values. indicates that the given values are not the actual values. indicates that the given values are a subset, subbag or sublist of the actual values. indicates that the given values are not a subset, subbag or sublist of the actual values. provides binary plus value for a feature. provides binary minus value for a feature. The attributes not discussed in this section are discussed in following sections as follows: the feats and the fVal attributes in section , the rel attribute in section , and the org attribute in section .

An fs element containing f elements with binary values can be straightforwardly used to encode the matrices of feature-value specifications for phonetic segments, such as the following for the English segment [s].

Using the additional tag set for feature structures, this might be encoded as follows. Note that fs elements may have a type attribute indicating the kind of feature structure in question. ]]> The restriction of specific features to specific types of values (e.g. the restriction of the feature strident to the values plus or minus) cannot be validated by an SGML parser. To enable an application program to check that only legal values for particular features appear, one may write a feature-system declaration; see chapter .

Here are the formal declarations of the fs, f, plus and minus elements. ]]> Feature, Feature-Structure and Feature-Value Libraries

As the example in the preceding section illustrates, the direct encoding of features structures can be verbose. Consequently, the effort of encoding large numbers of feature structures in this manner could be enormous, and could result in the creation of enormous files. To reduce the size and complexity of the task of encoding feature structures, one may use the feats attribute of the fs element to point to one or more of the features of that element. This indirect method of encoding feature structures presumes that the f elements are assigned unique SGML id values, and are collected together in fLib elements (feature libraries). In turn, feature structures can be collected together in fsLib elements (feature-structure libraries). Finally, one may use the fVal attribute of the f element to point to its values. This indirect method of encoding feature values presumes that the value elements are assigned id specifications, and are collected together in fvLib elements (feature-value libraries). The elements which are used for representing feature, feature-structure and feature-value libraries, along with their descriptions and attributes, are the following. assembles library of feature elements. Attributes include: indicates type of feature library (i.e., what kind of features it contains). assembles library of feature structure elements. Attributes include: indicates type of feature-structure library (i.e., what type of feature structures it contains). assembles library of feature value elements. Attributes include: indicates type of feature-value library (i.e., what type of feature values it contains).

For example, suppose a feature library for phonological feature specifications is set up as follows. ]]>

Then the feature structures that represent the analysis of the phonological segments (phonemes) /t/, /d/, /s/ and /z/ can be defined as follows. ]]>

The preceding are but four of the 128 logically possible fully specified phonological segments using the seven binary features listed in the feature library. Presumably not all combinations of features correspond to phonological segments (there are no strident vowels, for example). The legal combinations, however, can be collected together in a feature-structure library, with each element being given a unique id attribute, as in the following example. ]]>

Text elements can be linked to these feature structures in any of the ways described in section . In the following example, a linkGrp element is used to link selected characters in the text Caesar seized control to their phonological representations. Caesar seized control. ]]>

Because of the simplicity of the binary feature values, there is no particular gain in pointing at those values rather than specifying them directly. However, the mechanism of using the fVal attribute on f elements is useful for representing more complex feature values, and can be illustrated using binary values. Suppose the plus and minus elements are collected together in a fvLib, as follows. ]]> Then the feature library presented at the beginning of this section can be represented as follows. ]]>

Although fs elements are legitimate feature values (see section ), they are not allowed within fvLib elements. They should be placed in fsLib elements.

Here are the formal declarations of the fLib, fsLib and fvLib elements. ]]> Symbolic, Numeric, Measurement, Rate and String Values

By separating out feature values as content of f elements, we are able to classify those values into types. In section , the two empty elements which represent binary values are defined. In this section, we define five more feature-value elements: the empty elements sym for expressing symbolic values, nbr for expressing numeric values, msr for expressing measurement values, and rate for expressing rate values; and the element str for expressing string values. These elements, along with their descriptions and attributes, are the following. provides symbolic values for features. Attributes include: provides a symbolic value for a feature, one of a finite list that may be specified in a feature declaration. indicates the relation of the given value to the actual value. Legal values are: indicates that the actual value is that given. indicates that the actual value is not that given. provides a numeric value or range of values for a feature. Attributes include: provides a numeric value. together with value attribute, provides a range of numeric values. indicates whether value or range is to be understood as real or integer. Legal values are: specifies that value is an integer; if noninteger is given as value of value, then only integer part is used. specifies that value is a real number. indicates the relation of the given value or range to the actual value or range. Legal values are: indicates that the actual value or range is that given. indicates that the actual value or range is not the value or range given. indicates that the actual value or range is less than the given value or range. indicates that the actual value or range is less than or equal to the given value or range. indicates that the actual value or range is greater than the given value or range. indicates that the actual value or range is greater than or equal to the given value or range. provides a measure value or range of values for a feature. Attributes include: provides a unit for a measure feature, one of a finite list that may be specified in a feature declaration. provides a numeric value. together with value attribute, provides a range of numeric values. indicates whether value or range is to be understood as real or integer. Legal values are: specifies that value is an integer; if noninteger is given as value of value, then only integer part is used. specifies that value is a real number. indicates the relation of the given value or range to the actual value or range. Legal values are: indicates that the actual value or range is that given. indicates that the actual value or range is not the value or range given. indicates that the actual value or range is less than the given value or range. indicates that the actual value or range is less than or equal to the given value or range. indicates that the actual value or range is greater than the given value or range. indicates that the actual value or range is greater than or equal to the given value or range. provides a rate value or range of values for a feature. Attributes include: provides a unit for a rate feature, one of a finite list that may be specified in a feature declaration. provides an interval for a rate feature, one of a finite list that may be specified in a feature declaration. provides a numeric value. together with value attribute, provides a numeric range of values. indicates whether value is to be understood as real or integer. Legal values are: specifies that value is an integer; if noninteger is given as value of value, then only integer part is used. specifies that value is that of a real number. indicates the relation of the given value or range to the actual value or range. Legal values are: indicates that the actual value or range is that given. indicates that the actual value or range is not the value or range given by the element. indicates that the actual value or range is less than the given value or range. indicates that the actual value or range is less than or equal to the given value or range. indicates that the actual value or range is greater than the given value or range. indicates that the actual value or range is greater than or equal to the given value or range. provides a string value for a feature. Attributes include: indicates the relation of the given value to the actual value. Legal values are: indicates that the actual value is that given. indicates that the actual value is not that given. indicates that the value given is a substring of the actual value. indicates that the value given is not a substring of the actual value. indicates that the actual value is less than the given value. indicates that the actual value is less than or equal to the given value. indicates that the actual value is greater than the given value. indicates that the actual value is greater than or equal to the given value.

The sym element is to be used for the value of a feature when that feature can have any of a small, finite set of possible values, representable as character strings. For example, consider the problem of specifying the grammatical case, gender and number features of classical Greek noun forms. Assuming that the case feature can take on any of the five values nominative, genitive, dative, accusative and vocative; that the gender feature can take on any of the three values feminine, masculine, and neuter; and that the number feature can take on either of the values singular and plural, then the following may be used to represent the claim that noun form theás goddesses has accusative case, feminine gender and plural number. ]]>

Note that instead of using a symbolic value for grammatical number, one could have named the feature singular or plural and given it an appropriate binary value, as in the following example. Whether one uses a binary or symbolic value in situations like this is largely a matter of taste. ]]>

An SGML validator by itself cannot determine that particular values do or do not go with particular features; in particular, it cannot distinguish between the presumably legal encodings in the preceding two examples and the presumably illegal encoding in the following example. ]]>

There are two ways of attempting to ensure that only legal combinations of feature names and values are used. First, if the total number of legal combinations is relatively small, one can simply list all of those combinations in fLib elements (together possibly with fvLib elements), and point to them using the feats attribute in the enclosing fs element. This method is suitable in the situation described above, since it requires specifying a total of only ten (5 + 3 + 2) combinations of features and values. Further, to ensure that the features are themselves combined legally into feature structures, one can put the legal feature structures inside fsLib elements. A total of 30 feature structures (5 x 3 x 2) is required to enumerate all the legal combinations of individual case, gender and number values in the preceding illustration. Of course, the legality of the markup requires that the feat attributes actually point at legally defined features, which an SGML validator, by itself, cannot guarantee.

A more general method of attempting to ensure that only legal combinations of feature names and values are used is to provide a feature system declaration which includes a valRange element for each feature one uses. Here is a sample valRange element for the f name=case element described above; for further discussion of the valRange element, see chapter ; the vAlt element is discussed in section . ]]>

Similarly, to ensure that only legal combinations of features are used as the content of feature structures, one should provide fsConstraint elements for each of the types of feature structure one employs. For discussion of the fDecl and fsConstraint elements, see . Validation of the feature structures used in a document based on the feature-system declaration, however, requires that there be an application program that can use the information contained in the feature-system declaration.

Features with sym, plus, and minus values may be used to encode highly structured information such as may be obtained from precoded survey instruments. We illustrate by means of a coding scheme based on the one that is used for classifying potential printed entries in the British National Corpus. The scheme uses the following features and associated values.

A comprehensive feature library for this scheme is the following; the id specifications are those currently used in the BNC project. ]]>

An entry which is a book or periodical on world affairs, medium level, sampled from the middle, published between 1975 and 1993, and selected on a principled basis could then be assigned the following feature-structure code; this code could also be placed in a feature-structure library that contains all the possible fully-specified BNC entry classifications. This library would have a total of 1620 (3 x 9 x 3 x 5 x 2 x 2) entries. ]]>

The nbr element is to be used when the value of a feature is a number or a range of numbers. For example, suppose one wishes to encode information contained in classified advertisements for the sale or rental of real estate, such as the number of bedrooms and bathrooms in a listed property, and its advertised selling or rental price. One way of representing such information is as follows. ]]>

The information that the number of bedrooms is in the range from 3 to 5 and the monthly rent is in the range from 625.00 to 950.00 may be represented as follows, using the optional valueTo attribute. ]]>

The nbr (and also the msr and rate elements defined below) element also may have a type attribute to specify whether the values of the value and valueTo attributes are to be construed as integer or real numbers.

The msr element to be used when the value of a feature is a scalar quantity, essentially a combination of a numeric value and a symbolic value for identifying the scale on which the numeric value occurs. For example, real estate listings often provide the area (in square feet or meters) of a house or apartment and the area (in acres or hectares) of land being sold or rented. One way of representing information about such areas is as follows. ]]>

The value of the f name=monthly.rent feature in the two examples above might be more accurately analysed as a measurement rather than as a numeric value, since the amount of the rent in question is to be understood as payable in a particular currency, such as US or Canadian dollars or Italian lire. To make the currency scale explicit, the first example of this feature might be re-encoded as follows. ]]>

The unit and value attributes of the msr element are both required. If the unit attribute is not needed (for example, if no confusion would result if the unit attribute is not specified), then the nbr element may be used to express the feature value.

The rate element is to be used when the value of a feature is a rate. This element has a required per attribute for expressing the interval over which the rate is measured (typically, but not necessarily, a temporal interval), and an optional unit attribute for expressing the scalar unit. For example, one can encode the wage rate of USD $8.25 per hour as follows. ]]> Note that the f name=monthly.rent element illustrated above can be re-encoded as having a rate value, with a per=month attribute, as follows. ]]>

To encode interest, inflation or tax rates, the unit attribute can be used to indicate that the value attribute is to be understood as a percentage. For example, an interest rate of 8.25% per year can be encoded in either of the following two ways. ]]>

Finally, the str element is to be used for the value of a feature when that value is a string drawn from a very large or potentially unbounded set of possible strings of characters, so that it would be impractical or impossible to use the sym element. These values are expressed not as the values of the value attribute, as in the case of symbolic, numeric, measurement and rate values, but as the content of the str element. For example, one may encode the street address of a property in a real estate listing, as follows. 3418 East Third Street ]]>

Here are the formal declarations of the sym, nbr, msr, rate and str elements. ]]> Structured Values

Features may have structured values as well; these values are represented by either the fs element, or the fVal attribute on the f element, which can point to an fs element. Since an fs or a pointer to an fs is permitted to occur as a value of an f, recursion is possible. For example, an fs element may contain or point to an f element, which may contain or point to an fs element, which may contain or point to an f element, and so on. To illustrate the use of structured values, consider the following simple model of a personal record, consisting of a person's name, date of birth, place of birth, and sex. Each personal record is a fs type='personal record' tag, consisting of the corresponding four features, three of which take structured values, as in the following example. Kathleen Anne Barnett Austin ]]>

Now suppose that feature-structure libraries are maintained for name records and place records, and that the name record in the previous example is identified with the attribute id=Nkab027 and the place record is identified with the attribute id=txaustin. Feature-structure, rather than feature-value, libraries should be used for housing collections of feature structures. Then the preceding example could also be encoded as follows. (An identifier is also provided for the personal record.) ]]> This representation could be simplified further if a feature library is maintained for the year, month, day and sex features, so that the feats attribute may be used as follows. ]]>

Next, suppose that a feature-structure library is also maintained for personal records, and that the library also contains records for the parents of the individual identified in the previous example. Suppose that the father is identified as Pmfb009 and the mother as Parn002. Then the personal-record feature structure could be easily augmented to include pointers to the parents, as follows. ]]> If the personal records identified as Parn002 and Pmfb009 also contain information about the parents of those individuals, then from the present record, one would have access to that individual's grandparents as well.

Assuming that personal records of the sort described in this section are being maintained in association with text files, the records can be linked to those texts in any of the ways described in chapter , provided that identifiers are added for appropriate features, as in the following illustration.

Kathleen Anne Barnett was born at on April 17, 1968 in Seton Hospital in Austin to Mr. and Mrs. Michael F. Barnett of San Saba. ]]> Singleton, Set, Bag and List Collections of Values

In the discussion to this point, we have assumed that features have exactly one simple value. However, for many purposes, it is useful to be able to consider the values of certain features to be organized in more complex ways, for example as sets, bags (or multisets), or lists. Accordingly, we provide for four different ways in which feature values may be organized, namely as singletons, sets, bags and lists. We do so by means of an org attribute on the f element, which takes on one of the designated values single, set, bag, and list. A feature whose value is organized as a singleton is understood as having exactly one simple value. If more than one value is specified for it, we assume that only the first one is considered to be its true value. A feature whose value is organized as a set, bag or list may have any positive number of values as its content. Sets and bags are distinguished from lists in that the order in which the values are specified does not matter for the former, but does matter for the latter. Sets are distinguished from bags and lists in that repetitions of values do not count for the former but do count for the latter. SGML does not provide a way of validating that values for features organized as sets are not repeated; such validation would have to be carried out by an application program. Our method of representing set, bag and list values also does not permit such values to be directly embedded within one another. In order to embed a set within a set, for example, one must specify the embedded set as the value of a feature of a feature-structure value of the including set. This is not as hard as it sounds. The embedding of a list within a list is illustrated in the second example below.

No default value for the org attribute is declared in the DTD; however, a default value for that attribute can be declared for particular features in the feature-system declaration; see . Note that if only one value is specified for a given f element, the set, bag and list values of the org are all essentially equivalent to the singleton value, so the omission of the org attribute for such a feature is not problematic. Unless the value is the null element; see below.

To illustrate the use of the org attribute, suppose that the illustration of personal records from the previous section is extended to include pointers to an individual's siblings. Suppose also that the individual identified as fs id=Pkab027 has siblings identified as fs id=Panb005, fs id=Pmfb010 and fs id=Pzrb001 in the personal records library. Then we may extend the personal record for fs id=Pkab027 as follows. ]]>

A more elaborate illustration of the use of the org attribute is the the following f name=career org=list element which may be added to the personal records of an individual to record the job career of that individual. The feature structures which constitute the value of this feature document the jobs which the individual has held in the order in which they were held. Note that a list has been embedded within a list by means of intervening fs type='employment record' and f name=promotion.history elements. Safeway Stores ]]>

The information contained in such features may be linked to textual references in the usual way. The f name=status.code feature has been included to show how evaluative or interpretive information can be included along with information gleaned from textual records. The example presumes that the status code values are maintained in a designated fvLib.

Features with values organized as sets, bags or lists can sometimes be used instead of features organized as singletons, whose values are individual feature structures. For example, consider the following encoding of the English verb form sinks, which contains a f name=agreement element whose value is a feature structure which contains f name=person and f name=number elements with symbolic values. ]]>

If one does not care about the names of the features contained within the fs type='agreement structure' element, the containing f name=agreement element can be given an org attribute with the value set, and the contained fs element, together with the person and number feature elements it contained, can be eliminated, as follows. ]]>

The encoding in the preceding example presumes that the fDecl element for the f name=agreement element would look something like the following; for further details, see . ]]>

The set, bag or list which has no members is known as the null (or empty) set, bag or list. To refer to it, the null element is provided; its description and attributes are as follows. represents the null set if org=set is specified for the feature of which it is the value; represents the null bag if org=bag is specified for the feature of which it is the value; represents the null list if org=list is specified for the feature of which it is the value; has no interpretation if org=single is specified for the feature of which it is the value.

So, for example, to indicate that the individual identified above by the fs id=Pkab027 element has no siblings, we may specify the f name=siblings element as follows. ]]>

The null element when used with a feature organized as a singleton is a semantic error; however, its appearance as a value for such a feature cannot be flagged by SGML. The null element, when it appears as a feature value, must be the only value.

Here is the formal declarations of the null element. ]]> Alternative Features and Feature Values

In this section, two methods of representing the alternation (ambiguity or uncertainty) of features and feature values are presented. The first of these methods is to be used for nonsystematic or sporadic markup of alternation of individual features or values; it makes use of the special-purpose fAlt and vAlt elements. The other is to be used for systematic markup of alternation and for the alternation of groups of features or values; it makes use of the general-purpose alt element introduced in section . The fAlt and vAlt elements have the following description and attributes. provides alternative features for a feature structure or other feature alternation. Attributes include: indicates whether values are mutually exclusive. Legal values are: indicates that the values are mutually exclusive. indicates that the values are not mutually exclusive. provides alternative (disjunctive) values for a feature. Attributes include: indicates whether values are mutually exclusive. Legal values are: indicates that the values are mutually exclusive. indicates that the values are not mutually exclusive.

To illustrate the use of the fAlt element to represent the alternation of features, suppose one is uncertain whether a particular real estate advertisement describes a house with two bedrooms or with two bathrooms. This uncertainty can be represented as follows. ]]>

This representation leaves unspecified whether or not the alternation is mutually exclusive (i.e. whether having two bathrooms excludes the possibility of having two bedrooms and vice versa). To make this aspect of the alternation explicit, one can specify a value for the mutExcl attribute, as follows. ]]>

The fAlt element can also be used to represent uncertainty about whether the number of bathrooms is two or three, as follows; note that the attribute value mutExcl=Y can be inferred for the fAlt element in this example. ]]>

However, the f name=number.of.bathrooms element in this example can be factored out of the alternation, and a vAlt element used instead to represent the alternation of just the feature values, as follows. ]]>

The fAlt and vAlt elements can also be used to indicate certain alternations among values of features organized as sets, bags or lists. For example, suppose one uses a f name=extras org=set element in feature structures for real estate listings to represent items that are mentioned to enhance a property's sales value, such as whether it has a pool or a good view. Now suppose for a particular listing, the extras include an alarm system and a fenced-in yard, and either a pool or a jacuzzi (but not both). This situation could be represented, using the vAlt element, as follows. alarm system fenced-in yard pool jacuzzi ]]>

Now suppose the situation is like the preceding except that one is also uncertain whether the property has an alarm system or a fenced-in yard, or possibly both. This can be represented as follows. alarm system fenced-in yard pool jacuzzi ]]>

Finally, suppose that the listing specifies that the property has a finished basement, and that it also has either an alarm system and a pool or a fenced-in yard and a jacuzzi. This situation cannot be represented using the vAlt element, because the alternation holds between subsets of two values each. It can, however, be represented using the fAlt element, as follows; note that the str element with the value finished basement element must be repeated. finished basement alarm system pool finished basement fenced-in yard jacuzzi ]]>

If a large number of ambiguities or uncertainties involving a relatively small number of features and values need to be represented, it is recommended that the general-purpose alt element discussed in section be used, rather than the special-purpose fAlt and vAlt elements. The use of the alt element avoids the need to explictly represent the alternating elements more than once.

For example, suppose one has set up a fsLib element containing feature structures representing the morphological structures of classical Greek inflected words, along with collections of individual features and feature values, encoded by fLib and fvLib elements as appropriate. The following example shows how one might then represent the morphological structure of a feminine gender, accusative case, plural number noun form in classical Greek, such as theás goddesses discussed in section : ]]>

Now consider the noun form theaí goddesses, which is analyzable as a feminine plural noun form in either the nominative or the vocative case. We may represent this ambiguity by adding the following entries to the fsLib, fLib, and fvLib elements in the preceding example; assume that appropriate entries for unambiguous nominative and vocative case forms have already been entered. ]]>

If the fvLib element is not used, and specifications for particular feature values are entered as content of the f name=... elements in the fLib element, then the ambiguity can be represented as follows. ]]>

The alt element together with the join element can, unlike the fAlt and vAlt elements, be used to express alternations between sets of features. An example of such an alternation is found in certain feminine gender Greek noun forms ending in -as, such as peíras attempt(s), which may be analyzed as having either genitive case and singular number features or accusative case and plural number features, as follows (again, assuming the existence of other elements and identifier attributes for simple features and values). ]]>

Here are the formal declarations of the fAlt and vAlt elements. ]]> Boolean, Default and Uncertain Values

In this section we define four special empty elements used as feature values: the boolean value elements any and none, the dft element, and the uncertain element.

The boolean value elements are used to indicate whether the features they are associated with have values. The element any corresponds to the boolean value true (i.e., that the feature it is associated with has a value --- not the same as the binary value plus), and the element none corresponds to the boolean value false (i.e., that the feature it is associated with has no value). The dft element is used to indicate that the feature it is associated with has its default value in the feature structure in which it appears. Finally, the uncertain element may be used to indicate uncertainty about what value, if any, its associated feature has; it is equivalent to the alternation of the any and none elements. To indicate uncertainty about which of the possible legal values a particular feature has, one should use the any element.

The descriptions and attributes of these elements are as follows. represents boolean true value variable. represents boolean false value variable. provides default value for a feature. provides uncertainty value for a feature.

The values null and none are distinct. The former is to be used with a feature organized as a set, bag, or list to indicate that its value is the null set, bag, or list in a particular feature structure. The latter is to be used with such a feature to indicate that it has no value in a particular feature structure.

The boolean values any and none are also distinct from the binary values plus and minus. The latter pair are specific possible values for features, whereas the former pair represent ranges of possible values, not specific possible values, for features. For example, suppose that the valRange element for the f name=auxiliary element is declared as follows in the feature structure declaration, so that either boolean value is legal. ]]> Then the following two pairs of specifications are distinct. =/= =/= ]]> In this situation, the any element is equivalent to the alternation of the plus and minus elements, and the none element is equivalent to the negation of that alternation.

However, if the auxiliary feature is declared to take only the plus value, then the first pair of the specifications below are equivalent, but the second is not; in fact, the first member of the second pair is invalid. == =/= ]]>

It is even possible to declare that a particular feature can never have values, as follows for the feature f name=impossible. ]]> In this case, the following specifications are equivalent. == ]]>

The elements any and dft are also designed to be used in conjunction with the fDecl and valDefault elements in the feature system declaration discussed in section . First, consider the any element, and suppose that the valRange element in the fDecl element for the f name=gender element is specified as follows. ]]> Then the following two representations are equivalent. ]]>

Second, consider the dft element, and suppose that the default value for the f name=gender element is specified in the valDefault element of its fDecl element as having the value sym value=feminine. Then the following three representations are equivalent; note that if a f name=... element appears without content and without a valid fVal attribute, then it is equivalent to the same element with the dft element as its content. ]]>

Using the any and dft elements, together with an fDecl element for the corresponding feature in the feature system declaration, provides a method for underspecifying the value of that feature. The any element means that the associated feature has a legal value but what value it has is not specified. The dft element means that the associated feature has the value which the encoder has declared is the normal value of the feature.

The boolean elements any and none also have specific uses within fsConstraints and fDecl elements in feature system declarations, as described in chapter . For example, the element any can appear as the value of a feature contained within an fs of a particular type which appears in the cond element of an fsConstraints element, to indicate that the feature must appear in feature structures of the designated type (i.e., that it is obligatory) and that when it does appear, it may appear with any of its legal values. Similarly, none can appear in this way to specify that the feature cannot be present in feature structures of the indicated type (i.e., that it is obligatorily absent from such feature structures). All other features that are declared to have values are understood to be optional in such feature structures.

For example, the following may appear as part of the fsConstraints of a feature system declaration to indicate that a fs type='agreement structure' must contain a legal instance of the f name=number element but must not contain a legal instance of the f name=category element. ]]>

Further constraints can be imposed on a feature structure of a particular type in the valRange elements of features which take feature structures of that type as values. For example, suppose that verb and adjective agreement in German are represented by feature structures of the following sorts, in which verb forms agree in person and number with their subjects and adjective forms agree in gender, case, and number with their subjects. ]]>

In order to ensure that a fs type='agreement structure' tag which appears as the value of a f name=verbAgreement element may be specified for any person and number feature, but for no gender and case feature, we may provide a valRange element for the verbAgreement feature as follows. ]]> Similarly, to ensure that a fs type='agreement structure' element which appears as the value of a f name=adjAgreement element may be specified for any case, gender, and number feature, but for no person feature, we may provide a valRange element for the adjAgreement feature as follows. ]]>

The combination of declarations like these and the principle of subsumption discussed in , allows feature structures to be underspecified in text markup. For example, to indicate that a given adjective inflection feature (tagged f name=adjInflection) is a feature structure (tagged fs type='inflection structure') specifying plural number and any gender and case, we can omit the elements for gender and case on the fs element, as follows. ]]> When supplied as the value of a verb inflection feature (tagged f name=verbInflection), the same feature structure would be interpreted as an inflection structure specifying plural number and any person.

If an optional feature is not specified in a feature-structure value, then it is assumed to occur with the uncertain value. For further discussion, see section .

Here are the formal declarations of the any, none, dft, and uncertain elements. ]]> Indirect Specification of Values Using the rel Attribute

The rel attribute is provided for the feature value elements sym, nbr, msr, rate, str, fs, and default (but not plus, minus, null, vAlt, any, none, and uncertain). This attribute may be used for specifying which of various logical relations the given value has to the actual value of the feature. For all value elements for which the rel attribute is defined, except for fs, the default value for that attribute is eq, which means that the actual value is equal (or identical) to the given value. Accordingly, the following representations are both interpreted to mean that the value of the f name=case element is the sym value=genitive element. ]]> The Not-Equals Relation

The rel attribute can also be specified as having the value ne, which means that the associated feature has a value which is not equal to the given value. For example, the value nbr rel=ne value=1 in the following example denotes any legal numeric value for the element f name=number.of.bathrooms other than 1. ]]>

If an fDecl element has been provided which defines the legal values for the associated feature, then the value ne can be given a positive interpretation. For example, suppose that the valRange element is declared in the fDecl element for the f name=case element as follows. ]]>

Suppose also that the f name=case element is declared as obligatory in a particular feature structure. Then the following specifications are equivalent in that structure. ]]> That is, when the rel attribute occurs with the value ne in the value of an obligatory feature in a feature structure, the actual value of that feature may be any of its legal values other than the specified value.

On the other hand, if the f name=case feature is declared as optional in a particular feature structure, then the following specifications are equivalent in that structure. ]]> That is, when the rel attribute has the value ne in the value of an optional feature in a feature structure, the actual value of that feature may be any of its legal values other than the specified value, or none.

If the rel attribute is specified with the value ne for a nbr, msr, or rate element for which the valueTo attribute is also specified, then the actual range may be any range distinct from that given. For example, the following means that the number of bathrooms is a range distinct from 3 to 5 (e.g., 3 to 4, 3 to 6, 4 to 5, 4 to 6, 0 to 2, etc.). ]]> Other Inequality Relations

For the elements nbr, msr, rate, and str, the rel attribute may also take on the following values; the use of these values for the str element presumes that a particular character and string ordering (or sorting) convention is understood.

These attribute values may be used as shown in the following examples. The first states that the number of bedrooms is less than 5; the second that an illegal speed is any speed greater than 65 miles per hour; the third that a lot size is in a range which is less than or equal to the range of from 5 to 10 acres; We say that one range is less than or equal to another if both the value and valueTo attributes of the first are less than or equal to the corresponding attributes of the second. the fourth that the last name is any string greater than the empty string (i.e., any nonempty string, given normal string-ordering conventions); and the fifth that for a feature whose value is a list of two strings, the first precedes the string M and the second is the string M, or any string following it. MM ]]> Subsumption and Non-subsumption Relations

When the rel attribute is given the values sb or ns, the markup expresses the claim that the value given subsumes, or does not subsume, the actual value for the feature in question.

On the str element, these values are used to specify that the string value given in the str element is or is not a substring of the actual value of the feature. The first example below specifies that the actual feature value may be any string at all (since the empty string is a substring of every string), the second that it might be any string in which the string the occurs as a substring, and the third that it might be any string in which the string the does not occur as a substring. the the ]]>

On the fs element, the attribute values sb and ns indicate that the given feature structure does or does not legally subsume the actual feature structure. By definition, one feature structure subsumes another if the second feature structure is identical to the first or contains more information than the first. The default value for the rel attribute of the fs element is sb. The subsumption of feature structures is illustrated by the following four examples; suppose that the f name=person and f name=number elements are either optional or obligatory in these fs type='agreement structure' example elements. ]]>

The fourth example, pxnx, subsumes all four of the examples, since each contains at least as much information as does feature structure pxnx. Conversely, the first example, p3ns, subsumes only itself. Finally, the second and third examples, identified as p3nx and pxns attributes, subsume themselves and the first feature structure, but not each other.

If both person and number are obligatory features of agreement structure elements, then the last three elements in the preceding list have the same interpretation as their counterparts in the following list. ]]>

On the other hand, if both person and number are optional features of agreement structures, then those three elements have the same interpretation as their counterparts in the following list. ]]> That is, if an optional feature is omitted from a feature-structure representation, then that feature may have any of its legal values or the value uncertain.

The value sb is chosen as the default value for the rel attribute of the fs element, because it provides for the most economical means for underspecifying them. One situation in which it may be preferable to specify fs rel=eq is when the feature structure has many optional features and it is known that none of them occur.

The specification fs rel=ns is used to denote the feature structures that the specified feature structure does not subsume. This provides a handy way of saying that a certain combination of features is not present, for example the combination of third person and singular number, as in the agreement structure of the English verb form sink, understood as a present tense verb form. The following example expresses the claim that third-person and singular-number features are not both present in the agreement feature, but makes no further claim about what is present. ]]>

In most real situations, of course, one can infer, from the range of possible values for person and number, what the remaining possibilities are. Suppose, for example, that in the relevant feature system declaration, the features person and number are given the following valRange elements: ]]>

Suppose, further, that the person and number features are obligatory in feature structures of the type agreement structure. Then the element fs id=Np3ns above is equivalent to the following alternation; the features whose value is any may be omitted, since they are implied by the default value of sb for the rel attribute in the enclosing fs elements. ]]>

If, on the other hand, the person and number features were optional in feature structures of type agreement structure, then the interpretation of an underspecified feature structure will change. The element fs id=Np3ns given above is then equivalent to the following alternation; the features whose value is uncertain may be omitted as they are implied by the default subsumption relation holding between the structure given and the actual structure. ]]> Relations Holding with Sets, Bags, and Lists

The rel attribute is also provided for the f element, but is designed to be used with that element only when its org attribute (see section ) is set to set, bag, or list. When associated with the f element, the rel attribute may take on any of the following four values: eq, ne, sb, and ns. The default value is eq. Consider first the use of the rel attribute with the f element when the given value of the feature is null. ]]>

The first example states that the extras feature has the null set as its value. The second example states that the extras feature is a set which is not equal to the null set. That is, its actual value might be any non-null set. The third example states that the extras feature has as its value a set of which the null set is a subset; that is to say, any set at all, including the null set. Note that this is not equivalent to the following, which states that the extras feature has as its value a single element which is any legal value for the extras feature, including for example a str element containing the value pool. ]]>

Finally, the fourth example states that the extras feature has as its value a set of which the null set is not a subset. Since the null set is a subset of every set, the fourth example in effect claims that the extras feature has no legal value; it is thus equivalent to the following, which states directly that the extras feature has no value. ]]>

Consider next the use of the rel attribute with the f element when the given value of the feature is a single str element with the content pool: pool pool pool pool ]]>

The first example states that the value of the extras feature is a set consisting of a single member, namely a str element containing the value pool. The second example states that the extras feature has as its value a set which is not equal to the set consisting of this particular member. It could, however, be a two-membered set, one of whose members is some other value. This example is thus not equivalent to the following, which states that the extras feature has as its value a set comprising a single member other than a str element with the content pool: pool ]]>

The third example states that the extras feature has as its value any set of which the set consisting of the single member specified is a subset (i.e., any set which contains the element str with the value pool, and possibly others). Finally, the fourth example states that the extras feature has as its value any set which does not contain this element as a member. Varieties of Subsumption and Non-subsumption

The rel values sb and ns have different meanings depending on whether they occur within a str, fs or f element. However, the use of a common name for the value reflects a fundamental similarity in those meanings. For example, the value sb can be used in all three elements to indicate that the actual value is any string, any feature structure, or any set, bag or list, as follows. In the second example below, the rel attribute has not been specified, since it has the value sb by default on fs elements. ]]>

Because the value sb is not defined for the attribute rel on the nbr, msr and rate elements, the indication that a value may be any number, measure or rate is sometimes not quite as simple. Here is one way of specifying any positive or negative integer numeric value. Typically, there will be no need to use an encoding like this one as the value of a feature, since the any element is available for that purpose. However, in setting up the feature declaration for that feature, it may be necessary to use such an encoding, precisely so as to provide an interpretation for the use of the any element as the value of that feature. ]]>

The value ns also is understood in similar ways in the different elements in which it may occur. Above in this section, the equivalence of the following representations under certain conditions was shown (the id attributes and the redundant features with any values have been omitted). ]]>

The value ns has an analogous meaning when the value in question is a set rather than a feature structure. Recast in such terms, the equivalence above still holds good: ]]> Two Illustrations

In this section, we present two illustrations based on one text of how to associate feature structures and their components with textual elements. Our example text is the article Memoirs of a Dog Shrink that appeared in the popular magazine Dogs Today in August 1991. This text has been selected for inclusion in the British National Corpus. The first illustration associates the text with a structure that represents a significant portion of the information contained in the text. The second marks up the grammatical structure of the orthographic words and certain other comparable units in the text. Here is the text, with markup provided down to the level of s elements. The n attribute values are taken from the BNC markup; the id attribute values have been added for purposes of these illustrations. Memoirs of a Dog Shrink Cartoonist Russell Jones takes a ramble through Peter Neville's files Case number: 72 Name: Jessie Breed: Collie Problem: Light bulb phobia

Jess the collie was a laid-back sort of hound who spent most of his life stretched out on a fireside rug in his large Surrey home.

The closest he came to exercise was to open one eye every so often, if someone entered the room, or to open both eyes, smile, and wag his tail as he'd done on one occasion when confronted by a housebreaker!

This extremely lazy lifestyle was one long yawn from dawn to dusk. Only the odd bouts of involuntary twitching in his sleep reassured his owner that Jess was still safe and sound in the land of the living!

One winter night, as the mutt twitched away in front of the fire, his mind somewhere between Basingstoke and the twilight zone, a 100-watt light bulb in the standard lamp above his head suddenly exploded without warning!

According to his owner, who witnessed the spectacle, Jessie rose gracefully toward the ceiling like a space shuttle and, after lingering in mid-air for what seemed an eternity, crashed to the floor and fled the house with a speed and agility the owner found quite amazing.

Jessie did not return home for several hours. When he eventually did show up, it was obvious to all that he was a changed dog! What plodded through the front door was not the lovable, lazy hound who had once lived there but a grim-faced light bulb serial killer!

Within seconds of his return, Jessie launched a vicious attack on a table lamp, popping the bulb and wrecking the shade before charging into the lounge. There, in a frenzy of violence, he reduced the standard lamp to a table lamp in 10 seconds flat!

After a room-to-room chase lasting several minutes, during which every lamp in the house was turned to sawdust, the dog was finally caught and wrestled to the ground.

With his house plunged into darkness, Jessie's owner sought my help.

SIMPLE SOLUTION

When I first saw the dog, it was quite obvious he'd been deeply affected by the explosion and had developed a 100-watt phobia for light bulbs!

By placing his feeding bowl closer each day to a table lamp the dog gradually learned to live with his enemy. Within a couple of weeks, his killer instincts had disappeared and he was back where he belonged — twitching away peacefully on the fireside rug.

]]>

The first illustration is based on the observation that from the example text, it is possible to infer a fairly extensive medical history for the dog described in it. Suppose that we have a definition of a feature structure that represents a canine medical history. Then we can fill in feature values in that history from the text, and prepare a linkGrp element that specifies the links between the text segments and the various features specified in the feature structure. Here is a hypothetical example of such a filled-in feature structure and associated link group. Jessie Jessie Jess Surrey Neville Peter ran off, then returned and destroyed every lamp in the house light bulb phobia explosion of light bulb over patient's head positive reinforcement systematically decreased distance between feeding bowl and table lamp return to baseline condition ]]> ]]>

From this illustration, we see that links can be made not only between text and feature structure elements, but also between text and feature elements. For that matter, links between text and feature value elements can also be made.

The second illustration takes advantage of the fact that this text, like others that appear in the BNC, has been provided with detailed grammatical markup of most of its orthographic words and certain other comparable structural units. For example, the second paragraph of the above text has been marked up as follows. The&AT0; closest&AJS; he&PNP; came&VVD; to&PRP; exercise&NN1; was&VBD; to&TO0; open&VVI; one&CRD; eye&NN1; every so often&AV0;,&PUN; if&CJS; someone&PNI; entered&VVD; the&AT0; room&NN1;,&PUN; or&CJC; to&TO0; open&VVI; both&DT0; eyes&NN2;,&PUN; smile&VVI;,&PUN; and&CJC; wag&VVI; his&DPS; tail&NN1; as&CJS; he&PNP;'d&VHD; done&VDN; on&PRP; one&CRD; occasion&NN1; when&AVQ; confronted&VVN; by&PRP; a&AT0; housebreaker&NN1;!&PUN; ]]>

The entities that appear in this fragment may be expanded into pointers to feature structures that represent grammatical structure by means of entity definitions as follows. " > " > ]]>

This method of associating feature structures with textual elements has a number of drawbacks, most important of which is the fact that the association is implicit, relying on the relative position of pointer and associated text, rather than being explicit. A better method would be to segment the text into the units under analysis, and point to the feature structures from within the unit tags, by means of the ana attribute (see sections and ). The closest he came to exercise was to open one eye every so often , if someone entered the room ]]>

To provide pointers in both direction between text and structural analysis, one may supply both the text segments and the feature-structure tags with identifiers, and associate the segments with their analysis by means of a linkGrp (see section ), as follows.

First, we define a feature-structure library to represent all of the grammatical structures that are used in the BNC encoding scheme. (For illustrative purposes, we cite here only the structures needed for the first six words of the sample sentence): ]]> It will be noted that each feature structure in this library bears an identifier corresponding with the code supplied as the value for the ana attribute in the sample sentence. The component features of each feature structure are further specified by the feats attribute. These identify one or more f elements in the following feature library (again, only a few of the available features are quoted here): ]]>

Next, here is a markup of the start of our sample sentence being analyzed, with identifiers for each segment; see section for discussion of the phr, w, m and c elements used here. The closest he came to exercise was to open one eye every so often , if someone entered the room , or to open both eyes , smile , and wag his tail as he 'd done on one occasion when confronted by a housebreaker ! ]]>

Finally, here is a linkGrp, which contains all of the link elements that associate the text segments in the example sentence with their respective grammatical structures. ]]>

This grammatical markup represents the text as completely unambiguous, despite the fact that instances of the same textual unit are associated with different structure elements (e.g. the word to), and at least one sequence (namely to exercise, identified by the attribute values id=mds0905 and id=mds0906), is in fact structurally ambiguous in English. That sequence may be analyzed as a preposition followed by a singular noun (as this markup asserts) or as the infinitive marker followed by an uninflected form of a main verb.

To represent the ambiguity of words like to and exercise, and of phrases like to exercise, we may use the alt and join elements defined in sections and , as follows. First, we define alt elements for the ambiguous word classes, and add these to the fsLib. ]]> Next, we change the link elements for the text elements identified by the id=mds0905 and id=mds0905 attribute values. ]]>

As the encoding now stands, the phrase to exercise has four structural analyses associated with it: preposition followed by noun, preposition followed by verb, infinitive marker followed by noun and infinitive marker followed by verb. To narrow the choices down to the desired two, namely preposition followed by noun and infinitive marker followed by verb, we next form join elements to represent the desired sequences. ]]> We then define an alt element to express the alternation between the two join elements. ]]> Next, we add a phr element in the encoding of the text for the phrase to exercise. to exercise ]]> Finally, we add to the linkGrp element a link element connecting that phrase to the alt that represent its two analyses. ]]>

Note that the technique of forming join elements for sequences of structure elements and associating them with textual units can also be used to provide a complete structural analysis for the complex word he'd. First, we add an id attribute for the word. he 'd ]]> Next, we form a join of the structures associated separately with the subelements he and 'd. ]]> Finally, we define a link between the complex word and the new join element. ]]>