2nd Multimodal Learning and Applications Workshop

In conjunction with CVPR 2019.

Long Beach, CA - June 16th 2019 (Morning)

Room: Seaside 7

2nd Multimodal Learning and Applications Workshop (MULA 2019)

The exploitation of the power of big data in the last few years led to a big step forward in many applications of Computer Vision. However, most of the tasks tackled so far are involving mainly visual modality due to the unbalanced number of labelled samples available among modalities (e.g., there are many huge labelled datasets for images while not as many for audio or IMU based classification), resulting in a huge gap in performance when algorithms are trained separately.

This workshop aims to bring together communities of machine learning and multimodal data fusion. We expect contributions involving video, audio, depth, IR, IMU, laser, text, drawings, synthetic, etc. Position papers with feasibility studies and cross-modality issues with highly applicative flair are also encouraged therefore we expect a positive response from academic and industrial communities.

This is an open call for papers, soliciting original contributions considering recent findings in theory, methodologies, and applications in the field of multimodal machine learning. Potential topics include, but are not limited to:

  • Multimodal learning
  • Cross-modal learning
  • Self-supervised learning for multimodal data
  • Multimodal data generation and sensors
  • Unsupervised learning on multimodal data
  • Cross-modal adaptation
  • Multimodal data fusion
  • Multimodal transfer learning
  • Multimodal applications (e.g. drone vision, autonomous driving, industrial inspection, etc.)
  • Machine Learning studies of unusual modalities

Material and information about the 1st edition @ ECCV 2018 in Munich can be found here.


Papers will be limited to 8 pages according to the CVPR format (c.f. main conference authors guidelines). All papers will be reviewed by at least two reviewers with double blind policy. Papers will be selected based on relevance, significance and novelty of results, technical merit, and clarity of presentation. Papers will be published in CVPR 2019 proceedings.

All the papers should be submitted using CMT website.

Important Dates

  • Deadline for submission: March 10th, 2019 - 23:59 Pacific Standard Time
  • ---EXTENDED---
  • Firm Deadline for submission: March 15th, 2019 - 23:59 Pacific Standard Time
  • Notification of acceptance April 3rd, 2019
  • Camera Ready submission deadline: April 10th, 2019
  • ---EXTENDED---
  • Camera Ready submission deadline: April 12th, 2019
  • Workshop date: June 16th, 2019 (Morning)


Room: Seaside 7

08:20 - Initial remarks and workshop introduction
08:30 - WiFi and Vision Multimodal Learning for Accurate and Robust Device-Free Human Activity Recognition - Han Zou; Jianfei Yang; Hari Prasanna Das; Huihan Liu; Yuxun Zhou; Costas Spanos.

08:50 - Invited Speaker: Kristen Grauman - Disentangling Object Sounds in Video Download

09:40 - Two Stream 3D Semantic Scene Completion - Martin Garbade; Yueh-Tung Chen; Johann Sawatzky; Jürgen Gall.
10:00 - Co-compressing and Unifying Deep CNN Models for Efficient Human Face and Speaker Recognition - Timmy S. T. Wan; Jia-Hong Lee; Yi-Ming Chan; Chu-Song Chen.

10:20 - Coffee Break

10:30 - Invited Speaker: Andrew Owens - Learning Sight from Sound

11:20 - Spotlight session (3 mins presentation for each poster)

  • Cut Interruption Detection in Laser Cutting Machines: A Machine Learning Approach - Giorgio Santolini; Paolo Rota; Paolo Bosetti.
  • A Large-scale Attribute Dataset for Zero-shot Learning - Bo Zhao; Yanwei Fu; Rui Liang; Jiahong Wu; Yonggang Wang; Yizhou Wang.
  • Learning Common Representation from RGB and Depth Images - Giorgio Giannone; Boris Chidlovskii.
  • Unsupervised Domain Adaptation for Multispectral Pedestrian Detection - Dayan Guan; Xing Luo; Yanpeng Cao; Jianxin Yang; Yan-long Cao; George Vosselman; Michael Ying Yang.
  • Natural Language Guided Visual Relationship Detection - Wentong Liao; Bodo Rosenhahn; Lin Shuai; Michael Ying Yang.
  • Cross-stream Selective Networks for Action Recognition - Bowen Pan; Jiankai Sun; Wuwei Lin; Limin Wang; Weiyao Lin.
  • 12:00 - Poster Session - Pacific Arena Ballroom, Slots 157-166.

    Presentation Instructions

    Posters: Authors of all accepted papers will present their work in the poster session. Please use CVPR poster template. More instructions can be found in the 'News and Updates' section of CVPR website. Check your Paper ID to find your poster board.

    Oral talks: Please review the program above to check if you paper was accepted for oral presentation. The presentations will each have a duration of 15 minutes (plus 5 minute for questions).

    Spotlights: All other accepted papers will be presented in a 3-minute spotlight, right before the poster session. Please send your slides (pdf) by June 15th at mula.workshop@gmail.com

    Invited Speakers

    Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on visual recognition and search. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an Alfred P. Sloan Research Fellow, a Microsoft Research New Faculty Fellow, and a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence (IJCAI), the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013, and the Helmholtz Prize (computer vision test of time award) in 2017. She and her collaborators were recognized with the CVPR Best Student Paper Award in 2008 for their work on hashing algorithms for large-scale image retrieval, the Marr Prize at ICCV in 2011 for their work on modeling relative visual attributes, the ACCV Best Application Paper Award in 2016 for their work on automatic cinematography for 360 degree video, and a Best Paper Honorable Mention at CHI in 2017 for work on crowds and visual question answering.

    Andrew Owens is a postdoctoral scholar at UC Berkeley. He received a Ph.D. in computer science from MIT in 2016. He is a recipient of a CVPR Best Paper Honorable Mention Award (2011), a Microsoft Research Ph.D. Fellowship (2015), and an NDSEG Graduate Fellowship (2011). He will begin as an assistant professor at The University of Michigan in the spring of 2020.


    Pietro Morerio

    Istituto Italiano di Tecnologia, Italy

    Paolo Rota

    Università di Trento, Italy

    Michael Ying Yang

    University of Twente, Netherlands

    Bodo Rosenhahn

    Institut für Informationsverarbeitung, Leibniz-Universität Hannover, Germany

    Vittorio Murino

    Istituto Italiano di Tecnologia, Italy

    Program Committee

    • Andrea Zunino, Istituto Italiano di Tecnologia, Italy
    • Asako Kanezaki, AIST, Japan
    • Christian Heipke, Leibniz Universität Hannover, Germany
    • Christop Reinders, Leibniz Universität Hannover, Germany
    • Devis Tuia, Wageningen University and Research, Netherlands
    • Elisa Ricci, University of Perugia, Italy
    • Gui-Song Xia, Wuhan University, China
    • Hai Huang, Bundeswehr University Munich, Germany
    • Hanno Ackermann, Leibniz University Hannover, Germany
    • Jacopo Cavazza, Istituto Italiano di Tecnologia, Italy
    • Markus Gerke, TU Braunschweig, Germany
    • Riccardo Volpi, Istituto Italiano di Tecnologia, Italy
    • Valérie Gouet-Brunet, LASTIG/IGN, France
    • Vladimir Kniaz, GosNIIAS, Russia
    • Weiyao Lin, Shanghai Jiao Tong university, China
    • Wentong Liao, Leibniz University Hannover, Germany
    • Yanpeng Cao, ZJU, China
    • Yasuyuki Matsushita, Osaka University, Japan


    For additional info please contact us here