STARK-demo

Online service for dependency (sub)tree extraction and analysis.

Impressum >

About

STARK is a versatile tool for exploring the syntactic structure of language in linguistically annotated corpora, known as treebanks.

About STARK-demo >

Current version

The current version of the STARK tookit is 3.1.0.

Date of last update: 24.05.2025

Archive >

Availability

The source code for the STARK tool can be accessed via the CLARIN.SI repository under the following license: Apache 2.0.

Source code >

About

STARK is a versatile tool for exploring the syntactic structure of language in linguistically annotated corpora, known as treebanks. It identifies and extracts a wide range of syntactic structures, or "trees", to reveal which patterns actually occur in a language and how prominent they are with respect to various statistical metrics.


STARK is primarily aimed at processing treebanks based on the Universal Dependencies annotation scheme, but it also takes any other dependency treebank in the CONLL-U format as input. Essentially, the tool produces a table listing all tree structures that match user-defined criteria, along with their frequencies and other corpus-linguistic statistics. Its flexible settings support a wide range of investigations on both lexicalized and delexicalized data—from broad, bottom-up analyses (e.g. identifying all noun-headed structures) to more targeted, top-down queries (e.g. finding all verbs that take two objects).


STARK was developed by Kaja Dobrovoljc, Luka Krsnik and Marko Robnik Šikonja as part of the research project SPOT: A Treebank-Driven Approach to the Study of Spoken Slovenian (ARIS grant no. Z6-4617) and the CLARIN.SI Resource and Service Development grants (2019, 2024). With support from CJVT UL, this online interface was created to make STARK’s core functionality accessible to a broader audience, but provides a simplified set of options compared to the full-featured command-line version, which is available at: https://github.com/clarinsi/STARK.