Call for Papers
Gabriel Kreiman, Harvard University, USA
Raia Hadsell, Google DeepMind, UK
Lior Wolf, Facebook AI Research
Hugo Larochelle, Université de Sherbrooke, Canada
Sanja Fidler, University of Toronto, Canada
Xiaogang Wang, Chinese University of Hong Kong
Kate Saenko, Boston University, USA.
Maja Pantic, Imperial College London, UK
Title:How studying brains can help us get a deeper vision
Abstract:The enormous success of deep vision approaches is predicated upon and mimics only a small fraction of the biological circuitry involved in visual processing. I will argue that we are merely scratching the surface in terms of using insights from biological vision and that we can take inspiration from biology to translate neural algorithms into computational code. I will provide two examples that concern the roles of recurrent and top-down connections, both of which are ubiquitous throughout biological hardware and understudied in computational approaches. These connections can implement visual routines that extend vision beyond isolated object labeling abilities and solve challenging problems including object search in cluttered real-world scenes and pattern completion from minimal information.
Title:Deep RL for Navigation in Complex Environments
Abstract:Deep reinforcement learning has rapidly grown as a research field with far-reaching potential. However many of the key results have been shown in simple domains on a single task (i.e., Atari games). As the field matures, we are beginning to look to more sophisticated learning systems in order to solve more complex tasks in visuallyrich environments. I will describe some recent research from DeepMind that allows end-to-end learning in challenging visual environments.
Title:Unsupervised Cross-Domain Mapping
Abstract:In this talk I will present very practical methods for translating images between domains as well as theoretical insights on the feasibility of this mapping. I will describe methods that employ an auxiliary function as well as completely unsupervised techniques. For the theoretical perspective, I will provide links to the problem of unsupervised domain adaptation and a concrete definition of "semantics".
Title:Optimization as a Model for Few-Shot Learning
Abstract:Though deep neural networks have shown great success in the large data domain, they generally perform poorly on few-shot learning tasks, where a model has to quickly generalize after seeing very few examples from each class. The general belief is that gradient-based optimization in high capacity models requires many iterative steps over many examples to perform well. Here, we propose an LSTM-based meta-learner model to learn the exact optimization algorithm used to train another learner neural network in the few-shot regime. The parametrization of our model allows it to learn appropriate parameter updates specifically for the scenario where a set amount of updates will be made, while also learning a general initialization of the learner network that allows for quick convergence of training. We demonstrate that this meta-learning model is competitive with deep metric-learning techniques for few-shot learning.
Title:Tubelet-based Video Object Detection
Title:Deep models for activity detection and description
Invited Talk: Unsupervised Cross-Domain Mapping
Invited Talk: Optimization as a Model for Few-Shot Learning
Invited talk: How studying brains can help us get a deeper vision
Invited talk: TBD
Invited talk: Deep RL for Navigation in Complex Environments
Invited talk: Deep models for activity detection and description
Afternoon Break / poster session
Invited talk: Tubelet-based Video Object Detection
Invited talk: TBD
Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition
|Xiangbo Shu, Jinhui Tang, Guojun Qi, Yan Song, Zechao Li, liyan zhang|
Crowd-11: A Dataset for Fine Grained Crowd Behaviour Analysis
|Camille Dupont, Luis Tobias, Bertrand Luvison|
Temporal Domain Neural Encoder for Video Representation Learning
|Hao Hu, Zhaowen Wang, Joon-Young Lee, Zhe Lin, Guo-Jun Qi|
Recurrent Memory Addressing for describing videos
|Arnav Jain, Abhinav Agarwalla, Kumar Agrawal, Pabitra Mitra|
Temporally Steered Gaussian Attention for Video Understanding
|Shagan Sah, Thang Nguyen, Miguel Dominguez, Felipe Petroski Such, Raymond Ptucha|
SANet: Structure-Aware Network for Visual Tracking
|Heng Fan, Haibin Ling|
Fixation Prediction in Videos using Unsupervised Hierarchical Features
|Tzujui Wang, Hamed Rezazadegan Tavakoli, Jorma Laaksonen|
Learning Latent Temporal Connectionism of Deep Residual Visual Abstractions for Identifying Surgical Tools in Laparoscopy Procedures
|Kaustuv Mishra, Rachana Sathish, Debdoot Sheet|
Kernalised Multi-resolution Convnet for Visual Tracking
|Di Wu, Wenbin Zou, Xia Li, Yong Zhao|
Description of the workshop
The goal of the DeepVision Workshop is to accelerate the study of deep learning algorithms in computer vision problems. With the increase of acceleration of digital photography and the advances in storage devices over the last decade, we have seen explosive growth in the available amount of visual data and equally explosive growth in the computational capacities for image understanding. Instead of hand crafting features, recent advancement in deep learning suggests an emerging approach to extracting useful representations for many computer vision tasks.
- The submission site is https://cmt3.research.microsoft.com/DV2$
- The maximum paper length is 8 pages (including references!) using the CVPR main conference format.
- Submissions will be rejected without review if they contain more than 8 pages or violate the double-blind policy. <$
- All the selected papers will be included in Proceedings of CVPR 2017 and made available on IEEE eXplore.
- Paper Submission: March 31st, 2017
- Supplemental Material Submission: March 31st, 2017
- Author Notification: April 21st, 2017
- Camera Ready: May 15th, 2017
- Jose M. Alvarez, Data61 (CSIRO), Australia
- Nathan Silberman, Google Research, USA
- Dhruv Batra, Facebook AI Research / Georgia Tech, USA
- Yann LeCun, Facebook AI Research / NYU, USA
Temporal Deep Learning
Videos contain valuable temporal information that can be exploited to achieve better performance. Exploiting temporal information is of great importance in computer vision applications, like object tracking and recognition, scene analysis and understanding, etc. Deep learning based techniques are challenged to employ temporal information in such applications. Although some advances have been performed in this direction, mainly involving 3D convolutions, motion-based input features, or deep temporalbased models such as RNN-LSTM, significant advances are expected to be performed in this field.
- Maja Pantic, Imperial College London, UK
- Trevor Darrell, University of California, Berkeley, USA.
- Xiaogang Wang, Chinese University of Hong Kong
- Kamal Nasrollahi
- Sergio Escalera
- Ajmal Mian
- Gholamreza Anbarjafari
- Thomas B. Moeslund