Discovering Latent Information from Noisy Sources in the Cultural Heritage Domain

Fabrizio Scarrone

Today, there are many publicly available data sources, such as online museum catalogues, Wikipedia, and social media, in the cultural heritage domain. Yet, the data is heterogeneous and complex (diverse, multi-modal, sparse, and noisy). In particular, availability of social media (such as Twitter messages) is both a boon and a curse in this domain: social media messages can potentially provide information not available otherwise, yet such messages are short, noisy, and are dominated by grammatical and linguistic errors. The key claim of this research is that the availability of publicly available information related to the cultural heritage domain can be improved with tools capable of signaling to the various classes of users (such as the public, local governments, researchers) the entities that make up the domain and the relationships existing among them. To achieve this goal, I focus on developing novel algorithms, techniques, and tools for leveraging multi-modal, sparse, and noisy data available from multiple public sources to integrate and enrich useful information available to the public in the cultural heritage domain. In particular, research aims to develop novel models that take advantage of multi-modal features extracted by deep neural models to improve the performance for various underlying tasks.

Paper

Poster