Objectives: propose a multilingual
framework
for processing and analyzing social
media data within
semantically
defined application domains (e.g., through ontologies or thesauri), focusing on the
multilingual border regions of Basque Country and Béarn.
The main objective of this project is to assist decision-makers and local stakeholders
across various application domains (such as tourism) in
getting insights and indicators based on social media to address domain-specific
requirements.
I proposed a formal methodology
to build
multidimensional datasets from social
media. Building accurate and exhaustive datasets is a recurrent
challenge in the Web and Social Media Search research field. However, most
approaches currently used are ad hoc and, therefore, difficult to reuse.
This methodology addresses this issue by proposing an iterative and incremental
pipeline applied to several data feeds (e.g., posts, metadata, media, etc.),
incorporating both human feedback and automatic mechanisms to improve quality.
I conducted a comparative study
of rules-based,
fine-tuning, and few-shot learning
techniques alongside various new large language models (LLMs)
for extracting knowledge from unstructured, multilingual, and noisy social media
texts in the tourism domain. Social media posts are challenging in
Natural Language Processing (NLP) due to their multilingualism, brevity, the
presence of informal language, and frequent grammatical errors,
among other factors. Additionally, I investigated a recurrent challenge faced by
researchers: determining the minimum number of training
annotations required to achieve competitive results in a specific domain. Manual
annotations are both time-consuming and costly; thus, researchers
aim to annotate as few samples as possible while still maintaining high-quality
results.
I proposed modular,
domain-adaptive indicators by
reinterpreting the theory of
proxemics (Hall, 1966) for social media. The challenge is that
most indicators for social media are domain-specific, meaning they are effective
within a specific domain of application but difficult to adapt to
others. My indicators, expressed as similarity measures, stand out due to their
modularity based on proxemic dimensions, and their applicability across
heterogeneous entities, such as users, demographics, themes, time periods or places.
I proposed TextBI an interactive
dashboard designed
to visualize
multidimensional indicators on social media across various
dimensions (e.g., spatial, temporal, thematic, personal, sentimental).
It addresses
the challenge of presenting complex information in a way that is
adaptable to various domains and easily interpretable by non-computer scientists,
such as local stakeholders (e.g., tourism offices, municipal
councils). Unlike existing Business Intelligence (BI) tools, TextBI offers
interactive visuals specifically designed for social media, featuring sentiment
and engagement overlays, multilevel timelines, thematic maps,
proxemic crosshairs and interaction graphs.
Demonstration video of the TextBI dashboard
DA3T Project (2021 - 2022)
Digital Trace Analysis Device for the Valorization of Touristic
Territories
Objective: The project aims to develop a system for
analyzing
multidimensional mobility tracks both outdoor in cities and indoor, for example,
in museums,
to assist local planners and decision‑makers in managing and
promoting touristic areas in Nouvelle‑Aquitaine. It is a multidisciplinary project involving
both computer scientists and geographers, focused on creating tools
and methods for extracting, processing and analyzing mobility tracks. A mobile application
named Geoluciole
was developed to collect tracks from tourists in
various touristic cities of the region.
I worked on a multi-level and
multi-aspect model for analyzing semantic
trajectories, addressing several challenges in the geomatics field.
Specifically, the model focuses on: modeling semantic trajectories with data
enrichment associated with positions or segments; defining enrichment
generically to integrate various dimensions; and structuring this enrichment
according to a hierarchical organization.
I proposed a novel ETL platform
(Extract, Transform, Load)
dedicated to processing mobility tracks. It represents the first
mobility-specific ETL system and addresses the challenge of
seamlessly analyzing heterogenous mobility tracks coming
from various sources. More precisely, it allows geographers to
automatically integrate (e.g., process, enrich, visualize) many
mobility tracks through modular and reactive pipelines accessible
to users who are not necessarily computer scientists.
I designed 3D visualization
modules, including a customizable
space-time cube, and semantic trajectory enrichment modules
by leveraging open data sources, including OpenStreetMap,
Google Maps Places, and the DataTourisme ontology, as well as
through semi-structured interviews.
I participated in the design of composite and
interpretable
semantic trajectory similarity measures that assist geographers in
assessing the
similarity of touristic trajectories at various granularities (e.g., macro,
meso and micro).