pix2code: Generating Code from a Graphical User Interface Screenshot on ShortScience.org

arxiv.org
scholar.google.com

pix2code: Generating Code from a Graphical User Interface Screenshot
Beltramelli, Tony
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Léo Paillier 7 years ago

Generate code from a UI screenshot.

_Code:_ [Demo](https://youtu.be/pqKeXkhFA3I) and [code](https://github.com/tonybeltramelli/pix2code) to come.

## Inner-workings:

Decomposed the problem in three steps:

1.  a computer vision problem of understanding the given scene and inferring the objects present, their identities, positions, and poses.
2.  a language modeling problem of understanding computer code and generating syntactically and semantically correct samples.
3.  use the solutions to both previous sub-problems by exploiting the latent variables inferred from scene understanding to generate corresponding textual descriptions of the objects represented by these variables.

They also introduce a Domain Specific Languages (DSL) for modeling purposes.

## Architecture:

*   Vision model: usual AlexNet-like architecture
*   Language model: use onehot encoding for the words in the DSL vocabulary which is then fed into a LSTM
*   Combined model: LSTM too.

[![screen shot 2017-06-16 at 11 34 28 am](https://user-images.githubusercontent.com/17261080/27221124-c9cadcc6-5287-11e7-9d38-c4234af92912.png)](https://user-images.githubusercontent.com/17261080/27221124-c9cadcc6-5287-11e7-9d38-c4234af92912.png)

## Results:

Clearly not ready for any serious use but promising results!  
[![screen shot 2017-06-16 at 11 57 45 am](https://user-images.githubusercontent.com/17261080/27222031-0bf8e7de-528b-11e7-896f-cdb410f928c3.png)](https://user-images.githubusercontent.com/17261080/27222031-0bf8e7de-528b-11e7-896f-cdb410f928c3.png)

Your comment: