Call for Papers
|9:20||Kevin Murphy (Google)|
|10:30||Josef Sivic (INRIA)|
|11:05||Adriana Romero (Facebook AI)|
|11:40||Olga Russakovsky (Princeton)|
|14:00||Vittorio Ferrari (Google)|
|14:35||Chris Re (Stanford)|
|15:10||Devi Parik (Georgia Tech and Facebook AI)|
|15:45||Afternoon Break & Poster Session|
Kevin Murphy, (Google, USA)
Devi Parikh, (Georgia Tech and Facebook AI, USA)
Chris Re, (Stanford, USA)
Adriana Romero, (Facebook AI, USA)
Josef Sivic, (INRIA, France)
Olga Russakovsky, (Princeton, USA)
Vittorio Ferrari, (Google, Switzerland)
Title:Generative models for Images
Abstract:In this talk, I summarize two recent generative models for images that we have developed. The first is a conditional model of color images, given an input gray image. The basic idea is to use a conditional auto regressive model to generate multiple, diverse low-resolution color images, and then to upsample them and use them to colorize the high-resolution gray image. For details, see "PixColor: Pixel Recursive Colorization", BMVC 2017. The second is a latent variable model of (color) images and attributes. The basic idea is to use a joint VAE to allow us to generate images of differing levels of abstraction, conditioned on attributes with differing degrees of missing information. For details, see "Generative Models of Visually Grounded Imagination", ICLR 2018.
Title:Embodied Question Answering.
Abstract:Embodied Question Answering is a new AI task where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather information through first-person (egocentric) vision, and then answer the question ("orange"). EmbodiedQA requires a range of AI skills ï¿½ language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, long-term memory, and grounding language into actions. I will present a dataset of questions and answers in simulated indoor environments, evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning.
Title:Software 2.0 and Snorkel: Beyond hand-labeled data
Abstract:In the last several years, deep learning models have simultaneously become more performant and more readily available as easy-to-use, commodity tools--however, their deployment in practice is bottlenecked by the need for large, hand-labeled training sets. This talk describes Snorkel, a system that focuses on this emerging training data bottleneck in the software 2.0 stack. In Snorkel, instead of tediously hand-labeling individual data items, a user implicitly defines large training sets by writing simple programs, called labeling functions, that label subsets of data points. This allows users to build high-quality models despite the fact that these labeling functions will have varying quality, coverage, and specificity--and be correlated in unknown ways. A key technical challenge in Snorkel is to estimate the quality and correlations among these labeling functions without hand-labeled data. This talk will explain a theory of learning without labeled data, and a host of recent applications in natural language processing, structured data problems, and computer vision. This talk will also briefly discuss recent extensions of these core ideas to automatically generating data augmentations, synthesizing training data, and learning from multi-task supervision. Snorkel is open source on github. Technical blog posts and tutorials are available at Snorkel.Stanford.edu.
Title:Graph Attention Networks
Abstract:In recent years, deep learning has achieved remarkable results in many computer vision, speech and natural language processing problems. However, many interesting tasks involve data that can not be represented in a grid-like structure and that instead lies in an irregular domain. This is the case of 3D meshes, social networks, biological networks or brain connectomes. Such data can usually be represented in the form of graphs. In this talk, I will present our recent work on Graph Attention Networks (GATs). I will start by reviewing early approaches to leverage neural networks for processing graph structured data, with special emphasis on graph convolutions, highlighting potential issues and motivating our work. Then, I will introduce GATs, a novel neural network architecture that leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Finally, I will discuss the results we obtained on well established transductive and inductive benchmarks; and show some recent application of our model to mesh-based parcellation of the cerebral cortex.
Title:Weakly supervised learning for visual recognition
Abstract:The current successes in visual recognition are, in large part, due to a combination of learnable visual representations, supervised machine learning techniques and large-scale carefully annotated image collections. In this talk, I will argue that in order to build machines that understand the changing visual world around us the next challenges lie in developing visual representations that generalize to never seen before conditions and are learnable in a weakly supervised manner, i.e. from noisy and only partially annotated data. I will show examples of our work in this direction with applications in understanding narrated instructional videos, visual localization across changing conditions or finding visual correspondence.
Title:Knowledge transfer and human-machine collaboration for training visual models
Abstract:Object class detection and segmentation are challenging tasks that typically requires tedious and time consuming manual annotation for training. In this talk I will present three techniques we recently developed for reducing this effort. In the first part I will explore a knowledge transfer scenario: training object detectors for target classes with only image-level labels, helped by a set of source classes with bounding-box annotations. In the second and third parts I will consider human-machine collaboration scenarios (for annotating bounding-boxes of one object class, and for annotating the class label and approximate segmentation of every object and background region in an image).
- Jose M. Alvarez, Data61 (CSIRO), Australia
- Nathan Silberman, 4Catalyzer, USA
- Dhruv Batra, Facebook AI Research / Georgia Tech, USA
- Yann LeCun, Facebook AI Research / NYU, USA
Description of the workshop
Most of the major advances in Deep Learning have come from supervised learning. Despite these successes, supervised learning algorithms are characterized by a major limitation: they necessitate massive amounts of carefully, and typically expensively, annotated data. This workshop will emphasis future directions beyond supervised learning such as reinforcement learning and weakly supervised learning. Such approaches require far less supervision and allow computers to learn beyond mimicking what is explicitly encoded in a large-scale set of annotations. We encourage researchers to formulate innovative learning theories, feature representations, and end-to-end vision systems based on deep learning. We also encourage new theories and processes for dealing with large scale image datasets through deep learning architectures. We are soliciting original contributions that address a wide range of theoretical and practical issues including, but not limited to:
- Large scale image and video understanding with limited annotations:
- Video classification
- Object recognition
- Object tracking
- Scene understanding
- Industrial and medical applications
- Theoretical foundations of unsupervised learning.
- Unsupervised feature learning and feature selection.
- Deep learning in mobile platforms and embedded systems.
- Advancements in semi-supervised learning and transfer learning algorithms.
- Inference and optimization.
- Applications of unsupervised learning.
- Deep learning for robotics.
- Lifelong learning.
- Reinforcement learning.
As main difference with previous years, for this edition of the workshop, papers are meant to be extended abstracts showing current / preliminary / novel results to encourage discussion during the workshop.