|
Abstract This interview examines a recent study on Dependency Distance (length) Minimization, introduces earlier works on and the significance of this topic. Dependency distance, or, dependency length, is taken as an insightful metric of syntactic complexity in the framework of dependency grammar (DG). According to dependency grammar, the syntactic structure of a sentence consists of nothing but dependencies between individual words-an assumption that is widely accepted not only in computational linguistics but also in theoretical linguistics. A dependency relation has the following core properties: it is a binary relation between two linguistic units; it is usually asymmetrical, with one of the two units acting as the governor and the other as dependent; it is classified in terms of a range of general grammatical relations, as shown conventionally by a label on top of the arc linking the two units. Sentences are linearly unfolded, and as a result, the governor and the dependent may or may not be adjacent. That is, there may be different linear distances between governors and dependents. This linear distance is termed as dependency distance (length), usually measured by the number of the intervening words between them, which is believed to have much to do with parsing (processing) difficulty. In terms of dependency grammar (DG), the syntactic parsing of a sentence is based on successive input of individual words, committed to establishing, at each parsing state, syntactic relation between the presently processed word and a previous one. As a cognitive activity, syntactic parsing is complemented via working memory, on which different burdens may be imposed by different dependency distances: the intervening words may either strain the capacity the WM or result in, owing to time-decay of memory, difficult retrieval of a previous word. Hence, longer dependency distance, or more intervening words, probably means more syntactic complexity and higher cognitive cost in processing. Given the cognitive possibility that dependency distance positively correlates with syntactic complexity and processing difficulty, it may be assumed that human languages, which are definitely constrained by general cognitive mechanisms, should prefer structures with short dependency distances for the sake of less demand on working memory resources. This tendency is termed as Dependency Distance Minimization (DDM): in natural languages, a sentence should be structured in such a way so as to minimize its overall dependency distance syntactically related words in this sentence. The DDM hypothesis is presumed as one possible linguistic universal motivated by general human cognition. Obviously, the hypothesis of DDM is deduced from the cognitive assumption that working memory is limited in capacity and subject to time-invoked forgetting. Thus the validity of this hypothesis should be empirically tested. Evidences in support of the preference for short dependency distance were first found in comprehension experiments on different types of relative clauses (RC). However, due to the high cost and laboriously careful design, the experiments are usually conducted upon a small number of subjects and a limited range of artificially composed linguistic material. Therefore, when it comes to language universals like DDM, large corpus-based quantitative study may serve as a significant supplement to psychological experiments, especially in this big data era. Verbal communication is by nature a type of human behavior which is regulated, to a considerable degree, by human cognition. That is, there might well be some cognition-shaped patterns or universals in language. With the development of computer science, big-data-based statistical analysis has become one important means to detect patterns in various human behaviors. In this sense, large-scale corpus, which gives researchers easy access to big data of verbal behaviors, may contribute much to scientific linguistic researches that aim to detect linguistic patterns and to trace their cognitive motivations. In other words, if DDM is a general cognition-shaped tendency in language, corpus-based big-data analysis should be able to detect this tendency. What is noticeable is that investigation into DDM demands a dependency treebank, that is, corpus annotated with syntactic relations between words, because DD is concerned with the linear length of the syntactic relations between words. This interview briefly reviews the cognitive DDM researches based on corpus-data and comments on some existent problems and future directions in this field. In the past, linguistic universals were rarely considered in terms of cognitive constraints and seldom pursued through corpus-based big-data analysis. However, as expounded in this interview, researches into DDM in human languages reveal that it is valuable to cognitively investigate linguistic universals through statistical analysis of big-language-data, which strongly suggests that, to obtain truly scientific discoveries, it may well be essential for linguistic studies to integrate efforts from multiple disciplines—cognitive science, mathematics, physics and biology, to name just a few.
|
|
|
|
|
|
|