About
STARK is a versatile tool designed for analyzing and understanding the structure of sentences in large text collections, known as treebanks. It works by identifying and extracting various types of syntactic structures, or 'trees', to reveal which structures occur in a language and how significant they are with respect to their frequency and other useful corpus linguistic metrics.

STARK is primarily aimed at processing treebanks based on the Universal Dependencies annotation scheme, but it also takes any other dependency treebank in the CONLL-U format as input. Essentially, the tool generates a tabular file with a frequency list of all the trees matching the user-defined parameters. The flexibility of these settings allows the users to conduct a wide spectrum of investigations on both lexicalized and delexicalized trees -- from broad, bottom-up treebank analyses (e.g. extracting all noun-headed trees) to more detailed, top-down treebank querying (e.g. extracting all predicates with two objects).

STARK has been developed by Kaja Dobrovoljc, Luka Krsnik and Marko Robnik Šikonja as part of the 2019 CLARIN.SI Resource and Service Development grant and the research project SPOT: A Treebank-Driven Approach to the Study of Spoken Slovenian (ARIS grant no. Z6-4617). With the support of CJVT UL, this online interface has been developed to demonstrate STARK's functionalities to a wider audience, but provides a simplified set of options compared to the comprehensive command-line version of STARK, available at: https://github.com/clarinsi/STARK.

Should you have any additional questions or require assistance with the tool, please contact kaja.dobrovoljc@ff.uni-lj.si.