2nd SIGLEX endorsed COLING Workshop on

Cognitive Aspects of the Lexicon

Enhancing the Structure and Look-up Mechanisms of Electronic Dictionaries

COLING 2010 pre-conference workshop (Beijing, August 22, 2010),
following the 7th Intern. Conference on Cognitive Science

Invited Speaker

Eduard Hovy (Information Sciences Institute, University of Southern California), Resume of the talk and Extended Abstract.

Motivation

Whenever we read a book, write a letter or launch a query on a search engine, we always use words, the shorthand labels and concrete forms of abstract notions (concepts, ideas and more or less well specified thoughts). Yet, words are not only vehicles to express thoughts, they are also means to conceive them. They are mediators between language and thought, allowing us to move quickly from one idea to another, refining, expanding or illustrating our possibly underspecified thoughts. Only words have these unique capabilities, which is why they are so important.

Obviously, a good dictionary should contain many entries and a lot of information associated with each one of them. Yet, the quality of a dictionary depends not only on coverage, but also on accessibility of information. Access strategies vary with the task (text understanding vs. text production) and the knowledge available at the moment of consultation (word, concept, speech sounds). Unlike readers who look for meanings, writers start from them, searching for the corresponding words. While paper dictionaries are static, permitting only limited strategies for accessing information, their electronic counterparts promise dynamic, proactive search via multiple criteria (meaning, sound, related words) and via diverse access routes. Navigation takes place in a huge conceptual lexical space, and the results are displayable in a multitude of forms (e.g. as trees, as lists, as graphs, or sorted alphabetically, by topic, by frequency).

Many lexicographers work nowadays with huge digital corpora, using language technology to build and to maintain the lexicon. But access to the potential wealth of information in dictionaries remains limited for the common user. Yet, the new possibilities of electronic media in terms of comfort, speed and flexibility (multiple inputs, polyform outputs) are enormous. Computational resources are not prone to the same limitations as paperbound dictionaries. The latter were limited in scope, being confined to a specific task (translation, synonyms, ...) due to economical reasons, but this limitation is not justified anymore.

Today we can perform all tasks via one single resource, which may comprise a dictionary, a thesaurus and even more. The goal of this workshop is to perform the groundwork for the next generation of electronic dictionaries, that is, to study the possibility of integrating the different resources, as well as to explore the feasibility of taking the user's needs, knowledge and access strategies into account.

Back to the top

Topics

For this workshop we invite papers including but not limited to the following topics:

Conceptual input of a dictionary user. What is in the authors' minds when they are generating a message and looking for a word? Do they start from partial definitions, i.e. underspecified input (bag of words), conceptual primitives, semantically related words, something akin to synsets, or something completely different? What does it take to bridge the gap between this input, incomplete as it may be, and the desired output (target word)?
Organizing the lexicon and indexing words. Concepts, words and multi-word expressions can be organized and indexed in many ways, depending on the task and language type. For example, in Indo-European languages words are traditionally organized in alphabetical order, whereas in Chinese they are organized by semantic radicals and stroke counts. The way words and multi-word expressions are stored and organized affects indexing and access. Since knowledge states (i.e. knowledge available when initiating search) vary greatly and in unpredictable ways, indexing must allow for multiple ways of navigation and access. Hence the question: what organizational principles allow the greatest flexibility for access?
Access, navigation and search strategies based on various entry types (modalities) and knowledge states. Words are composed of meanings, forms and sounds. Hence, access should be possible via any of these components: via meanings (bag of words), via forms, simple or compound ('hot, dog' vs. 'hot-dog'), and via sounds (syllables). Access should be possible even if input is given in an incomplete, imprecise or degraded form. Furthermore, to allow for natural and efficient access, we need to take the users' knowledge into account (search space reduction) and provide adequate navigational tools, metaphorically speaking, a map and a compass. How do existing tools address these needs, and what could be done to go further?
NLP applications: Contributors can also demonstrate how such enhanced dictionaries, once embedded in existing NLP applications, can boost performance and help solve lexical and textual-entailment problems, such as those evaluated in SEMEVAL 2007, or, more generally, generation problems encountered in the context of summarization, question-answering, interactive paraphrasing or translation.

Back to the top

Aims and Target Audience

The aim of this workshop is to bring together researchers involved in the construction and application of electronic dictionaries to discuss modifications of existing resources in line with the users' needs, thereby fully exploiting the advantages of the digital form. Given the breadth of the questions, we welcome reports on work from many perspectives, including but not limited to: computational lexicography, psycholinguistics, cognitive psychology, language learning and ergonomics.

Important Date and Action

August 22, 2010	Cogalex Workshop
Asap	Please register here !

Program

9:00-9:15 Welcome to participants

	09:15	E. Hovy (invited keynote address)	Distributional Semantics and the Lexicon

	10:30-11:00 coffee break

	Session 1: Semantics and Cognition

	11:00	A. Das & S. Bandyopadhyay	SemanticNet-Perception of Human Pragmatics

	11:30	G. E. Lebani & E. Pianta	Exploiting Lexical Resources for Therapeutic Purposes: the Case of WordNet and STaRS.sys

	12:00	Y. Muramatsu, K. Uduka & K. Yamamoto	Textual Entailment Recognition Using Word Overlap, Mutual Information and Subpath Set

	12:30	C. Strapparava & G. Ozbal	The Color of Emotions in Texts

	13:00-14:00 Lunch break

	Session 2: Lexicography

	14:00	N. Béchet & M. Roche	How to Expand Dictionaries by Web-Mining Techniques

	14:30	N. Curteanu, A. Moruz & D. Trandabat	An Optimal and Portable Parsing Method for Romanian, French, and German Large Dictionaries

	15:00	E. Lavagnino & J. Park	Conceptual Structure of Automatically Extracted Multi-Word Terms from Domain Specific Corpora: a Case Study for Italian

	15:30-16:00 Coffee break

	Session 3: Word Access and Language Learning

	16:00	H. Gao	Computational Lexicography: A Feature-based Approach in Designing an E-dictionary of Chinese Classifiers

	16:30	S. Markantonatou, A. Fotopoulou, M. Alexopoulou & M. Mini	In Search of the 'Right' Word

	17:00	M. Zock, D. Schwab & N. Rakotonanahary	Lexical Access, a Search-Problem (key note presentation)

	17:30-18:00 Wrap Up Discussion
18:00 End of the workshop

Back to the top

Prior conferences

The 1st COGALEX workshop was co-located with COLING 2008 in Manchester (UK). The workshop proceedings are available as pdf files (4MB) and can be downloaded either from here, or from the ACL Anthology. A similar workshop, entitled Enhancing and using electronic dictionaries was held at COLING-2004 (Geneva).

Related Conferences in Beijing

Next to COLING 2010 there are two conferences workshop participants may be interested in:

the 7th International Conference on Cognitive Science (ICCS) which takes place August 17 to 20, 2010, just before COLING. It is our hope that this unique opportunity will foster scientific exchange between the scientific communities of Computational Linguistics and Cognitive Science. The ICCS' venue is the China National Convention Center (CNCC) which is close to COLING's site, the Beijing International Convention Center (BICC), located on the other side of the China National Stadium ('Bird Nest').
Also somewhat related is the 6th IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'10). Yet, as it is scheduled for August 21 to 23, 2010, it overlaps with our workshop.

Back to the top

Program Committee

Slaven Bilac (Google Tokyo, Japan)
Pierrette Bouillon (ISSCO, Geneva, Switzerland)
Dan Cristea (University of Iasi, Romania)
Katrin Erk (University of Texas, USA)
Olivier Ferret (CEA LIST, France)
Thierry Fontenelle (EU Translation Centre, Luxemburg)
Sylviane Granger (Université Catholique de Louvain, Belgium)
Gregory Grefenstette (Exalead, Paris, France)
Ulrich Heid (IMS, University of Stuttgart, Germany)
Erhard Hinrichs (University of Tuebingen, Germany)
Graeme Hirst (University of Toronto, Canada)
Eduard Hovy (ISI, University of Southern California, Los Angeles, USA)
Chu-Ren Huang (Hongkong Polytechnic University, China)
Terry Joyce (Tama University, Kanagawa-ken, Japan)
Philippe Langlais (DIRO/RALI, University of Montreal, Canada)
Marie Claude L'Homme (University of Montreal, Canada)
Verginica Mititelu (RACAI, Bucharest, Romania)
Alain Polguère (Nancy-Université & ATILF CNRS, France)
Reinhard Rapp (University of Tarragona, Spain)
Sabine Schulte im Walde (University of Stuttgart, Germany)
Gilles Sérasset (IMAG, Grenoble, France)
Serge Sharoff (University of Leeds, UK)
Anna Sinopalnikova (FIT, BUT, Brno, Czech Republic)
Carole Tiberius (Institute for Dutch Lexicology, The Netherlands)
Takenobu Tokunaga (TITECH, Tokyo, Japan)
Dan Tufis (RACAI, Bucharest, Romania)
Piek Vossen (Vrije Universiteit, Amsterdam, The Netherlands)
Yorick Wilks (Oxford Research Institute, UK)
Michael Zock (LIF-CNRS, Marseille, France)
Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)

Workshop organizers and contact persons

Michael Zock (LIF-CNRS, Marseille, France), michael.zock@lif.univ-mrs.fr
Reinhard Rapp (University of Tarragona, Spain), reinhard.rapp@urv.cat

Back to the top