Taskonomy: Disentangling Task Transfer Learning on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Taskonomy: Disentangling Task Transfer Learning
Amir Zamir and Alexander Sax and William Shen and Leonidas Guibas and Jitendra Malik and Silvio Savarese
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.CV, cs.AI, cs.LG, cs.NE, cs.RO
more

Summaries/Notes 1

[link] Summary by Oleksandr Bailo 6 years ago

The goal of this work is to perform transfer learning among numerous tasks and to discover visual relationships among them. Specifically, while we intiutively might guess the depth of an image and surface normals are related, this work takes a step forward and discovers a beneficial relationship among 26 tasks in terms of task transferability - many of them are not obvious. This is important for scenarios when an insufficient budget is available for target task for annotation, thus, learned representation from the 'cheaper' task could be used along with small dataset for the target task to reach sufficient performance on par with fully supervised training on a large dataset.

The basis of the approach is to compute an affinity matrix among tasks based on whether the solution for one task can be sufficiently easily used for another task. This approach does not impose human intuition about the task relationships and chooses task transferability based on the quality of a transfer operation in a fully computational manner.

The task taxonomy (i.e. **taskonomy**) is a computationally found directed hypergraph that captures the notion of task transferability over any given task dictionary. It performed using a four-step process depicted in the figure below: ![Process overview. The steps involved in creating the taxonomy.](http://taskonomy.stanford.edu/img/svg/Process.svg)

- In stage I (**Task-specific Modelling**), a task-specific network is trained in a fully supervised manner. The network is composed of the encoder (modified ResNet-50), and fully convolutional decoder for pixel-to-pixel tasks, or 2-3 FC layers for low-dimensional tasks. Dataset consists of 4 million images of indoor scenes from about 600 buildings; every image has an annotation for every task.

- In stage II (**Transfer modeling**), all feasible transfers between sources and targets are trained (multiple inputs task to single target transfer is also considered). Specifically, after the task-specific networks are trained in stage I, the weights of an encoder are fixed (frozen network is used to extract representations only) and the representation from the encoder is used to train a small readout network (similar to a decoder from stage I) with a new task as a target (i.e. ground truth is available). In total, about 3000 transfer possibilities are trained.

- In stage III (**Taxonomy solver**), the task affinities acquired from the transfer functions performance are normalized. This is needed because different tasks lie in different spaces and transfer function scale. This is performed using ordinal normalization - Analytical Hierarchy Process (details are in the paper - Section 3.3). This results in an affinity matrix where a complete graph of relationships is completely normalized and this graph quantifies a pair-wise set of tasks evaluated in terms of a transfer function (i.e. task dependency).

- In stage IV (**Computed Taxonomy**), a hypergraph which can predict the performance of any transfer policy and optimize for the optimal one is synthesized. This is solved using Binary Integer Program as a subgraph selection problem where tasks are nodes and transfers are edges. After the optimization process, the solution devices a connectivity that solves all target tasks, maximizes their collective performance while using only available source tasks under user-specified constraints (e.g. budget).

So, if you want to train your network on an unseen task, you can obtain pretrained weights for existing tasks from the [project page](https://github.com/StanfordVL/taskonomy/tree/master/taskbank), train readout functions against each task (as well as combination of multiple inputs), build an affinity matrix to know where your task is positioned against the other ones, and through subgraph selection procedure observe what tasks have favourable influence on your task. Consequently, you can train your task with much less data by utilizing representations from the existing tasks which share visual significance with your task. Magnificent!

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private