Research Topics
Data Quality: Exploration, Mining and Quantitative Cleaning
My work aims at assessing the quality of data, i.e., to detect and quantify data anomalies and inconsistencies that can co-exist in any data set that is stored and structured in a database, semi- or not structured in static files, or continuously broadcasted as data streams. Data quality problems are duplicates, outliers, inconsistencies, missing values, conflicting or obsolete data. All data types (numeric, categorical, structured, semi-structured or free text, geo- and multimedia) and all application domains may be affected by these problems.
My first research objective is to effectively detect them, in particular in such large data sets that most of the current detection methods are inoperative.
My second contribution is to propose strategies for automating data correction and consolidation. For this purpose, I adapt statistical methods, exploratory data analysis and data mining techniques and integrate them in database engineering.
Awards
- ICIQ 2012 Best Paper Award : César Guerra-García, Ismael Caballero, Laure Berti-Équille, Mario Piattini. DAQ_UWE : A Framework doe Designing Data Quality Aware Web Applications. Proc. of the 16th International Conference on Information Quality (ICIQ), Adelaide, Australia, November 2011.
- Marie Curie Outgoing International Fellowship (3 years) funded by the European Commission (selection rate: 18.8% of 445 submissions, Grant FP6-MOIF-CT-2006-041000)
- INFORSID Best Junior Researcher paper (French Conference on Information Systems) for the paper entitled “Qualité de données multi-sources et recommandation multi-critère”, L. Berti, “Prix Jeunes Chercheurs INFORSID’99”, Proc. of INFORSID, pp. 185-204, 1999.
Patents
- U.S. Patent “Detecting dependence between sources”, filed on May 14, 2009; co-inventors: Xin Luna Dong (AT&T Lab Research), Laure Berti-Équille and Divesh Srivastava (AT&T Lab Research).
- U.S. Patent “Scalabe Automatic Repair for minimal change and maximal likelihood”, filed on May 25, 2011 and co-inventors: Mohamed Yakout (Purdue Univ., USA), Laure Berti-Équille and Ahmed K. Elmagarmid (Qatar Computing Research Institute, Qatar).
Former Research Groups
- TEXMEX project, INRIA Rennes (2002-2005)
- SYMBIOSE Project, INRIA Rennes (2000-2002)
- LIA, University of Avignon (1999-2000)
- SIS Research Group, University of Toulon (1996-1999)
