4th Multimodal Learning and Applications Workshop

In conjunction with CVPR 2021.

VIRTUAL - June 19th 2021 (Morning)

4th Multimodal Learning and Applications Workshop (MULA 2021)

The exploitation of the power of big data in the last few years led to a big step forward in many applications of Computer Vision. However, most of the tasks tackled so far are involving visual modality only, mainly due to the unbalanced number of labelled samples available among modalities (e.g., there are many huge labelled datasets for images while not as many for audio or IMU based classification), resulting in a huge gap in performance when algorithms are trained separately.

Recently, a few works have started to exploit the synchronization of multimodal streams (e.g., audio/video, RGB/depth, RGB/Lidar, visual/text, text/audio) to transfer semantic information from one modality to another reaching surprising results. Interesting applications are also proposed in a self-supervised fashion, where multiple modalities are learning correspondences without need of manual labelling, resulting in a more powerful set of features compared to those learned processing the two modalities separately. Other works have also shown that particular training paradigms allow neural networks to perform well when one of the modalities is missing due to sensor failure or unfavorable environmental conditions. These topics are gaining lots of interest in computer vision community in the recent years.

The information fusion from multiple sensors is a topic of major interest also in industry, the exponential growth of companies working on automotive, drone vision, surveillance or robotics are just a few examples. Many companies are trying to automate processes, by using a large variety of control signals from different sources. The aim of this workshop is to generate momentum around this topic of growing interest, and to encourage interdisciplinary interaction and collaboration between computer vision, multimedia, remote sensing, and robotics communities, that will serve as a forum for research groups from academia and industry.

We expect contributions involving, but not limited to, image, video, audio, depth, IR, IMU, laser, text, drawings, synthetic, etc. Position papers with feasibility studies and cross-modality issues with highly applicative flair are also encouraged. Multimodal data analysis is a very important bridge among vision, multimedia, remote sensing, and robotics, therefore we expect a positive response from these communities.

Potential topics include, but are not limited to:

  • Multimodal learning
  • Cross-modal learning
  • Self-supervised learning for multimodal data
  • Multimodal data generation and sensors
  • Unsupervised learning on multimodal data
  • Cross-modal adaptation
  • Multimodal data fusion and data representation
  • Multimodal transfer learning
  • Multimodal scene understanding
  • Vision and Language
  • Vision and Sound
  • Multimodal applications (e.g. drone vision, autonomous driving, industrial inspection, etc.)


Papers will be limited to 8 pages according to the CVPR format (c.f. main conference authors guidelines). All papers will be reviewed by at least two reviewers with double blind policy. Papers will be selected based on relevance, significance and novelty of results, technical merit, and clarity of presentation. Papers will be published in CVPR 2021 proceedings.

All the papers should be submitted using CMT website https://cmt3.research.microsoft.com/MULA2021.

Important Dates

  • Deadline for submission: March 10th, 2021 - 23:59 Pacific Standard Time
  • ---EXTENDED---
  • Firm Deadline for submission: March 14th, 2021 - 23:59 Pacific Standard Time
  • Notification of acceptance April 8th, 2021
  • Camera Ready submission deadline: April 19th, 2021
  • Workshop date: June 19th, 2021 (Morning)



Invited Speakers



Michael Ying Yang

University of Twente, Netherlands

Pietro Morerio

Istituto Italiano di Tecnologia, Italy

Paolo Rota

Università di Trento, Italy

Bodo Rosenhahn

Institut für Informationsverarbeitung, Leibniz-Universität Hannover, Germany

Vittorio Murino

Istituto Italiano di Tecnologia & Università di Verona, Italy & Huawei Technologies, Ireland

Program Committee



Old Editions


For additional info please contact us here