Hudson indicates that the proportion of nouns in written English is about 37%. Since then, many other languages haven been studied in this respect, finding out that the proportion of nouns in all human languages is an invariant. German and English have differences in word formation, though they both belong to the West Germanic language subfamily. As for nouns, on the one hand, German has a larger proportion of compound nouns, resulting in intensive information, thus the total quantity of its nouns could be relatively smaller than that of other languages; on the other hand, nominalized structures are common in German, which may cause a larger proportion of nouns in comparison with other languages. Does German conform to the universal law of language? We try to answer this question based on three large-scale corpora of German: The DWDS-Kernkorpus consists of texts of different genres from the 20th Century and has more than 100 million words in total; The Deutsches Textarchiv (DTA) is a diachronic corpus of written German and contains about 150 million words from texts of the same genres as DWDS-Kernkorpus; The TüBa-D/Z treebank is a German newspaper corpus with more than 1.5 million words, containing 3,644 mainstream newspaper articles of ″Die Tageszeitung″ from 1989 to 1999. In order to make the results comparable, we adopted the same classification criteria for nouns and the part-of-speech tagsets suggested by Hudson. The result shows that the proportion of nouns in all three corpora of written German is about 38%. Thus, the above-mentioned hypothesis is corroborated. Furthermore, we studied the relationship between the proportions of nouns in different genres. Differences exist between different genres in terms of the proportions of subclasses of nouns including common nouns, proper nouns and pronouns. While common nouns are larger in proportion in informational texts, imaginative texts have a larger proportion of pronouns. This result also complies with that of Hudson. Little work has previously been conducted with the diachronic development of language. In this study, we additionally explored the relationship between time and the proportion of nouns (and its subclasses) by analyzing texts from 1500 to 1950. While no big change of the total proportion of nouns in the last five hundred years was observed, there is a shift between the proportion of common nouns and that of pronouns. The proportion of common nouns has been increasing continuously from 14.02% at the beginning of 16th Century to about 24% in the 20th Century, whilst the proportion of pronouns has decreased from 16.66% to 10%. To our best knowledge, this diachronic tendency hasn't been addressed so far. We argue this tendency is caused by the social and technical development as well as the evolution of the language itself. In conclusion, this study corroborated the hypothesis that the distribution of nouns in all human languages is an invariant. The proportion of subclasses of nouns in written German varies among genres and has changed a lot with time, although the general proportion of nouns remains the same. Moreover, we observed a continuous increase of the proportion of common nouns and a correspondingly decrease of the proportion of pronouns in written German in the last five hundred years. This interesting finding offers a new perspective to language evolution and quantitative linguistic research and deserves further studies.
李媛 段庭辉 刘海涛. 名词分布是人类语言的不变量吗?——以德语书面语中名词分布为例[J]. 浙江大学学报(人文社会科学版), 2019, 5(6): 39-.
Li Yuan Duan Tinghui Liu Haitao. Is the Distribution of Nouns an Invariant in Human Languages? — An Investigation Based on Written German Corpora. JOURNAL OF ZHEJIANG UNIVERSITY, 2019, 5(6): 39-.