Abstracts



T. Joyce and I. Srdanovic

Comparing Lexical Relationships Observed within Japanese Collocation Data and Japanese Word Association Norms

While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketch Engine (SkE) tool (Srdanovic et al., 2008) and the relationships found within Japanese word association sets taken from the large-scale Japanese Word Association Database (JWAD) under ongoing construction by Joyce (2005, 2007).

The comparison results indicate that while some relationships are common to both linguistic resources, many lexical relationships are only observed in one resource. These findings suggest that both resources are necessary in order to more adequately cover the diverse range of lexical relationships. Finally, the paper reflects briefly on the implementation of association-based word-search strategies into electronic dictionaries proposed by Zock and Bilac (2004) and Zock (2006).


Back to top of the page

M. Zock and D. Schwab

Lexical access based on underpecified input

Words play a major role in language production, hence finding them is of vital importance, be it for speaking or writing. Words are stored in a dictionary, and the general belief holds, the bigger the better. Yet, to be truly useful the resource should contain not only many entries and a lot of information concerning each one of them, but also adequate means to reveal the stored information. Information access depends crucially on the organization of the data (words) and on the navigational tools. It also depends on the grouping, ranking and indexing of the data, a factor too often overlooked.

We will present here some preliminary results, showing how an existing electronic dictionary could be enhanced to support language producers to find the word they are looking for. To this end we have started to build a corpus-based 'association matrix', composed of target words and access keys (meaning elements, related concepts/ words), the two being connected at their intersection in terms of weight and type of link, information used subsequently for grouping, ranking and navigation.


Back to top of the page

F. Moerdijk, C. Tiberius and J. Niestadt

Accessing the ANW Dictionary

This paper describes the functional design of an interface for an online scholarly dictionary of contemporary standard Dutch, the ANW. One of the main innovations of the ANW is a twofold meaning description: definitions are accompanied by 'semagrams'. In this paper we focus on the strategies that are available for accessing information in the dictionary and the role semagrams play in the dictionary practice.


Back to top of the page

C. Brierley and E. Atwell

ProPOSEL: a human-oriented prosody and PoS English lexicon for machine-learning and NLP

ProPOSEL is a prosody and PoS English lexicon, purpose-built to integrate and leverage domain knowledge from several well-established lexical resources for machine learning and NLP applications. The lexicon of 104049 separate entries is in accessible text file format, is human and machine- readable, and is intended for open source distribution with the Natural Language ToolKit. It is therefore supported by Python software tools which transform ProPOSEL into a Python dictionary or associative array of linguistic concepts mapped to compound lookup keys. Users can also conduct searches on a subset of the lexicon and access entries by word class, phonetic transcription, syllable count and lexical stress pattern. ProPOSEL caters for a range of different cognitive aspects of the lexicon.


Back to top of the page

G. Sierra

Natural Language Searching in Onomasiological Dictionaries

When consulting a dictionary, people can find the meaning of a word via the definition, which usually contains the relevant information to fulfil their requirement. Lexicographers produce dictionaries and their work consists in presenting information essential for grasping the meaning of words. However, when people need to find a word it is likely that they do not obtain the information they are looking for. There is a gap between dictionary definitions and the information being available in peoples' mind. This paper attempts to present the conceptualisation people engage in, in order to arrive at a word from its meaning. The insights of an experiment conducted show us the differences between the knowledge available in peoples' minds and in dictionary definitions.


Back to top of the page

C. Mueller-Spitzer and C. Moehrs

First ideas of user-adapted views of lexicographic data exemplified on OWID and elexiko

This paper is a project report of the lexicographic Internet portal OWID, an Online Vocabulary Information System of German which is being built at the Institute of German Language in Mannheim (IDS). Overall, the contents of the portal and its technical approaches will be presented. The lexical database is structured in a granular way which allows to extend possible search options for lexicographers. Against the background of current research on using electronic dictionaries, the project OWID is also working on first ideas of useradapted access and user-adapted views of the lexicographic data. Due to the fact that the portal OWID comprises dictionaries which are available online it is possible to change the design and functions of the website easily (in comparison to printed dictionaries). Ideas of implementing user-adapted views of the lexicographic data will be demonstrated by using an example taken from one of the dictionaries of the portal, namely elexiko.


Back to top of the page

C. Huang, Y. Chou, C. Hotani, S. Chen and W.Y. Lin

Multilingual Conceptual Access to Lexicon based on Shared Orthography: An ontology-driven study of Chinese and Japanese

In this paper we propose a model for conceptual access to multilingual lexicon based on shared orthography. Our proposal relies crucially on two facts: That both Chinese and Japanese conventionally use Chinese orthography in their respective writing systems, and that the Chinese orthography is anchored on a system of radical parts which encodes basic concepts. Each orthographic unit, called hanzi and kanji respectively, contains a radical which indicates the broad semantic class of the meaning of that unit.

Our study utilizes the homomorphism between the Chinese hanzi and Japanese kanji systems to identify bilingual word correspondences. We use bilingual dictionaries, including WordNet, to verify semantic relation between the crosslingual pairs. These bilingual pairs are then mapped to an ontology constructed based on relations to the relation between the meaning of each character and the basic concept of their radical parts. The conceptual structure of the radical ontology is proposed as a model for simultaneous conceptual access to both languages. A study based on words containing characters composed of the '口(mouth)' radical is given to illustrate the proposal and the actual model. The fact that this model works for two typologically very different languages and that the model contains generative lexicon like coersive links suggests that this model has the conceptual robustness to be applied to other languages.


Back to top of the page

N. Curteanu, A. Moruz and D. Trandabat

Extracting Sense Trees from the Romanian Thesaurus by Sense Segmentation and Dependency Parsing

This paper aims to introduce a new parsing strategy for large dictionary (thesauri) parsing, called Dictionary Sense Segmentation & Dependency (DSSD), devoted to obtain the sense tree, i.e. the hierarchy of the defined meanings, for a dictionary entry. The real novelty of the proposed approach is that, contrary to dictionary 'standard' parsing, DSSD looks for and succeeds to separate the two essential processes within a dictionary entry parsing:

sense tree construction and sense definition parsing. The key tools to accomplish the task of (autonomous) sense tree building consist in defining the dictionary sense marker classes, establishing a tree-like hierarchy of these classes, and using a proper searching procedure of sense markers within the DSSD parsing algorithm. A similar but more general approach, using the same techniques and data structures for (Romanian) free text parsing is SCD (Segmentation-Cohesion- Dependency) (Curteanu; 1988, 2006), which DSSD is inspired from. A DSSDbased parser is implemented in Java, building currently 91% correct sense trees from DTLR (Dictionarul Tezaur al Limbii Romane - Romanian Language Thesaurus) entries, with significant resources to improve and enlarge the DTLR lexical semantics analysis.


Back to top of the page

S. Andreyeva

Lexical-Functional Correspondences and Their Use in the System of Machine Translation ETAP-3

ETAP-3 is a system of machine translation consisting of various types of rules and dictionaries. Those dictionaries, being created especially for NLP system, provide for every lexeme not only data about its characteristics as a separate item, but also different types of information about its syntactic and semantic links to other lexemes. The paper shows how the information about certain types of semantic links between lexemes represented in the dictionaries can be used in a machine translation system. The paper deals with correspondences between lexicalfunctional constructions of different types in the Russian and the English languages. Lexical-functional construction is a word-combination consisting of an argument of a lexical function and a value of this lexical function for this argument.

The paper describes the cases when a lexical functional construction in one of these languages corresponds to a lexicalfunctional construction in the other language, but lexical functions represented by these two constructions are different. The paper lists different types of correspondences and gives the reasons for their existence. It also shows how the information about these correspondences can be used to improve the work of the linguistic component of the machine translation system ETAP-3.


Back to top of the page

K. Kanzaki, N. Tomuro and H. Isahara

The ”Close-Distant” Relation of Adjectival Concepts Based on Self-Organizing Map

In this paper we aim to detect some aspects of adjectival meanings. Concepts of adjectives are distributed by SOM (Self- Organizing map) whose feature vectors are calculated by MI (Mutual Information). For the SOM obtained, we make tight clusters from map nodes, calculated by cosine. In addition, the number of tight clusters obtained by cosine was increased using map nodes and Japanese thesaurus. As a result, the number of extended clusters of concepts was 149 clusters. From the map, we found 8 adjectival clusters in super-ordinate level and some tendencies of similar and dissimilar clusters.


Back to top of the page

A. Max and M. Zock

Looking up 'phrase' rephrasings via a pivot language

Rephrasing text spans is a common task when revising a text. However, traditional dictionaries often cannot provide direct assistance to writers in performing this task. In this article, we describe an approach to obtain a monolingual phrase lexicon using techniques used in Statistical Machine Translation. A part to be rephrased is first translated into a pivot language, and then translated back into the original language. Models for assessing fluency, meaning preservation and lexical divergence are used to rank possible rephrasings, and their relative weight can be tuned by the user so as to better address her needs. An evaluation shows that these models can be used successfully to select rephrasings that are likely to be useful to a writer.


Back to top of the page

B. Gaume, K. Duvignau, L. Prevot and Y. Desalle

Toward a cognitive organization for electronic dictionaries, the case for semantic proxemy

We compare a psycholinguistic approach of mental lexicon organization with a computational approach of implicit lexical organization as found in dictionaries. In this work, we associate dictionaries with 'small world' graphs. This multidisciplinary approach aims at showing that implicit structure of dictionaries, mathematically identified, fits the way young children categorize. These dictionary graphs might therefore be considered as 'cognitive artifacts'. This shows the importance of semantic proximity both in cognitive and computational organization of verbs lexicon.


Back to top of the page

G. Kremer, A. Abel and M. Baroni

Cognitively Salient Relations for Multilingual Lexicography

Providing sets of semantically related words in the lexical entries of an electronic dictionary should help language learners quickly understand the meaning of the target words. Relational information might also improve memorisation, by allowing the generation of structured vocabulary study lists. However, an open issue is which semantic relations are cognitively most salient, and should therefore be used for dictionary construction. In this paper, we present a concept description elicitation experiment conducted with German and Italian speakers. The analysis of the experimental data suggests that there is a small set of concept-class-dependent relation types that are stable across languages and robust enough to allow discrimination across broad concept domains. Our further research will focus on harvesting instantiations of these classes from corpora.


Back to top of the page

R. Rapp

The Computation of Associative Responses to Multiword Stimuli

It is shown that the behaviour of test persons as observed in association experiments can be simulated statistically on the basis of the common occurrences of words in large text corpora, thereby applying the law of association by contiguity which is well known from psychological learning theory. In particular, the focus of this work is on the prediction of the word associations as obtained from subjects on presentation of multiword stimuli. Results are presented for applications as diverse as crossword puzzle solving and the identification of word translations based on non-parallel texts.


Back to top of the page


Last updated: 23/7/08