Project Blog
04.02.2013
The KoGloss manual is available
A detailed description in the form of a manual documents all steps of the KoGloss method. The manual has been developed for the needs of specific target groups: linguists, language teachers, translators and termininologists in higher and vocational education.
You can find the German and English versions oft he KoGloss manual here:
21.06.2012
Digital analysis: researching and describing language patterns
The KoGloss method would like to offer an opportunity for targeted research of language patterns characteristic to a specific topic, subject area or discourse. The basis of this independent research and analysis is a user-created text collection (corpus) that can be analysed by a variety of different software. By searching individual words, parts of words or sequences, these programs are able to find language patterns and descriptive information from the previously collected texts. These different software products are different in terms of their complexity and available tools. The KoGloss method uses the freely available program AntConc that is easy to use even for a linguistic novice. [read more]
26.01.2012
Seminar with regard to professional life for German philology Bachelor and Master students of the University Duisburg-Essen
Winter semester 2011/2012
Ulrike Haß. Economic technical language. Analysis and documentation
During the seminar a task that occurs in several occupations was simulated. A certain professional technical variation of a language should be adhered to, i.e. documented and presented in a suitable way for to the purposes of transparent and smooth communication. In order to do that the KoGloss method was gradually introduced. The participants explored the KoGloss corpus, generated typical word clusters with the help of software, and tried linguistic analysis with these clusters to create the glossary entries. By early March, every participant had to create eight entries.
In addition to German students, DAAD scholarship holders from Russia and Ukraine and students from the Dutch Nijmegen partner university also took part in the seminar. The students for whom German is a foreign language were very interested in the method and could see its further practical application in other cases and languages.
Of particular interest to the seminar participants was to observe all the entries created by the Estonian students. One could almost watch the Estonian glossary growing, and the German speakers could even understand some elements from the Estonian entries, namely the international terms.
From March 2012 on it will be possible to link the Estonian, Latvian and Lithuanian entries to the German ones, which gives us a multi-lingual tool.
23.01.2012
Project milestones were reached
Two important project milestones were reached at the end of 2011:
- completion of the KoGloss corpus;
- identification of German, Estonian, Latvian and Lithuanian language constructions in economic activity and economic development related professional discourse.
KoGloss corpus
The KoGloss corpus is a multilingual comparative corpus consisting of four sub-corpora (German, Estonian, Latvian, Lithuanian). To achieve the comparability of the sub-corpora the compilation of all four of them was in compliance with the principles established in the earlier stages of the project:
- thematic comparability (discourse of economic activity);
- quantitative comparability (number of texts and text length);
- qualitative comparability (rather wide categories of texts based on the functional features).
There are, however, country-specific differences hidden behind this overall impression. It became evident rather quickly that specific characteristics of the publishing and text culture in different countries create an obstacle for a homogeneous text collection building. In terms of their functional features the texts of the corpus were divided into four categories – official and scientific publications, news and business press. In the Estonian and Lithuanian sub-corpora the latter two are merged into one – press, because it is not possible to differentiate between business and news press in these countries.
In total the corpus is comprised of over 1000 texts with approximately 12 million characters, incl. spaces.
List of constructions
After first experiments with corpus-linguistic analysis, the Estonian, German, Lithuanian and Latvian language constructions from the professional discourse economy and economic development were identified. Similarly to corpus building, the corpus analysis was carried out in various seminars together with students. The subject-specific constructions were recorded in the Moodle workspace in the course of collaborative work. The created glossary entries include information about the morphological, syntactic, semantic and pragmatic characteristics of the selected construction.
In order to guarantee the usage of this method in the future training and professional life of linguistic novices, every step working with the corpus is documented and systematised in a guide (manual) for independent language analysis.
Project meeting in Tartu, June 2011
From 27.06 to 28.06.2011 partners of the KoGloss-project met in Tartu (Estonia) to debrief on the project's provisionally closed stages and to discuss further implementation of the agenda. An exchange of experience regarding the use of the Kogloss- method in seminars took place within the meeting. Summaries of these experiences will be published in in the Teaching section of our website.
Initial results from the corpus compilation unveiled some language-specific features with regard to text sources and text length. One issue causing difficulty was the inclusion of academic texts in the corpus. Latvian scientific publications are mostly available in printing and not accessible on the Internet and academic papers in Estonia are preferably published in English. Another difficulty is that the text length varies depending on language. For example, German texts are longer than Estonian texts, because in Estonian language multiple pieces of grammatical information may be bundled into individual words. This is done by attaching additional elements to the stem of the word. Estonian sentences have therefore a more compressed form than sentences in languages which use at this point extra words. However, in order to ensure a balanced corpus, several adjustments still have to be made before the start of the corpus analysis.
For the corpus analysis the free software AntConc1 will be used. AntConc offers a variety of basic corpus query tools permitting the identification of potential constructions. A test analysis was performed on all the four corpora at the meeting. It turned out that a majority of constructions in Baltic languages corresponded to German compound nouns. It was decided to add such compounds into glossaries, if it is clear that there is a terminological equivalence given. In addition, the project partners considered the question of the lemmatization. Since KoGloss committed to a corpus analysis which is simple and appropriate for laymen, the possibility to work without lemmatization was foreseen. Another alternative to lemmatization is the use of so-called wildcards in AntConc. The wildcard setting permits the search of different inflectional forms of a word. Since this is language dependent again, the project partners will have to find out which of the two options is more effective until the next meeting.
The glossary function of the virtual learning environment Moodle will be used for documenting and presenting the results of the corpus analysis. An important item on the meeting agenda dealt therefore with the roles and rights in the Moodle Glossary. During that meeting project partners took decisions on implementation of the format in which the constructions will be presented in Moodle and on the cross linking between the glossary entries of the four glossaries. From the beginning of the next semester it should be possible to write glossary entries.
1 Anthony, L. (2011). AntConc (Version 3.2.4) [Computer Software]. Tokyo, Japan: Waseda University. Avalable from http://www.antlab.sci.waseda.ac.jp/