Query language

This page documents the search expression language which is used to query the dependency parsed corpora in the Drevesnik online interface. It is based on the query language of the dep_search tool developed by the University of Turku. In addition to querying the morphological and dependency annotations using the Universal Dependencies scheme, it also enables searching by the language-specific JOS morphosyntactic tags (XPOS column in Slovenian CONLL-U treebanks).

All expression examples below are links that search through the reference SSJ dependency treebank (randomized results, short sentences).

Token specification

Querying by word forms

Tokens with particular word form are searched by typing the token text as-is. Examples:

Base form (lemma) is given with the L= prefix:

Querying by morphological features

Part-of-speech categories and other morphological features can be defined in two ways, as all corpora are annotated both by the cross-linguistically standardized Universal Dependencies (UD) annotation scheme and the local language-specific JOS annotation scheme. Both schemes are well documented and comparable with respect to an adequate description of Slovenian morphology, so the choice of the annotation scheme mostly depends on the user's preferences.

JOS morphosyntactic tags

JOS morphosyntactic tags (XPOS column in Slovenian CONLLU treebanks) can be specified using the X= prefix. Given that each position in the tag represents a specific morphological feature with multiple possible values, the use of special operators is also supported, i.e. the dot operator (.) what matches any character and the asterisk operator (*) that matches 0 or more repetitions of the preceding character. Some examples:

UD morphological features

The part-of-speech category can be specified by writing the tags as-is, while other morphological features are defined as attribute-value pairs in the form of Category=Tag.

Special operators

It is also possible to combine all above token specifications with the AND (&) and OR (|) operators:

Word forms, lemmas and tags can also be negated by typing the negation operator ! before a feature. Some examples:

Token can be left unspecified by typing an underscore character ('_').

Dependency specification

Dependencies are expressed using < and > operators, which mimick the "arrows" in the dependency graph.

The underscore character _ stands for any token, that is, a token on which we place no particular restrictions. Here are simple examples of basic search expressions that restrict dependency structures:

Note that the left-most token in the expression is always the target of the search and also identified in search results (marked as green). While queries delo > _ and _ < delo return the excact same graphs, matched tokens differ.

The dependency type can be specified typing it right after the dependency operator, e.g. _ <type _ or _ >type _. The | character denotes a logical or, so any of the given dependency relations will match.

You can specify a number of dependency restrictions at a time by chaining the operators:

Priority is marked using parentheses:

Negation is marked using the negation operator !, which can be used to negate the < and > operators as well as specific dependency types. Some examples:

Direction of the dependency relation can be specified using operators @R and @L, where the operator means that the right-most token of the expression must be at the right side or at the left side, respectively.

Combining queries

Several queries can be combined with the + operator. A query of the form query1 + query2 + query3 returns all trees which independently satisfy all three queries.

Universal quantifcation

The operator '->' introduces a condition that all the matched tokens should fulfill (i.e. the tokens or structures preceding this operator). For example: