# Instance_Insertion **Repository Path**: mirrors_NVlabs/Instance_Insertion ## Basic Information - **Project Name**: Instance_Insertion - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-18 - **Last Updated**: 2026-05-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Context-aware Synthesis and Placement of Object Instances Please find the technique details in the [paper](https://papers.nips.cc/paper/8240-context-aware-synthesis-and-placement-of-object-instances.pdf) ## License Copyright (C) 2018 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode). ## Network Architecture The network contains two major modules, a "where" module (the first figure) to determine the fesiable location of the object, and a "what" module (the second figure) to generate a proper shape. The two modules are jointly trained, where the blue dashed arrows indicate the linkage of them.                     ## Dataset - Download and place [cityscape dataset](https://www.cityscapes-dataset.com/) at "db_root" in the options.py ## How to run the code - Check `options.py` and specify your own path accordingly. - Run `main.py`, it will save results for pairs of different random vectors, i.e., (z_appr1, z_spatial1), (z_appr2, z_spatial1), and (z_appr1, z_spatial2) **All code tested on Ubuntu 16.04, pytorch 0.3.1, and opencv 3.4.0** ## Explanation of code details `options.py` - `db_root`: as explained above - `target_class`: person or car - `image_sizex_small`: image width when training where module - `image_sizey_small`: image height when training where module - `image_sizex_big`: image width when training what module - `image_sizey_big`: image height when training what module - `compact_sizex`: image width of generated object - `compact_sizey`: image height of generated object - `embed_dim_small`: dim of output of an encoder in where module - `embed_dim_big`: dim of output of an encoder in what module `main.py` - Training part starts from line 56 - Between line 56 and 161, it loads training images and check whether it is okay to proceed. We pick 2 seg maps at random. Image 1) `b_real_seg_small` or `b_real_seg_big` corresponds to x+ in where and what. It is contains at least one object (variable "has_ins"), then proceed (line 94). Then, check whether there is at least one proper object that are not too small or too narrow (line 120). Image 2) `b_cond_seg_small` or `b_cond_seg_big` corresponds to x in where and what. It is just a random image. - Forward starts at line 161 - Log at line 186 - Save images at line 203 `model.py` - Define networks in line 44. Networks are actually defined in networks.py - Define optimizers in line 114 - Set inputs from line 152-240 We transform a box using A into x+ to prepare real examples, which is done by `stn_fix`. - Reparameterize function for VAE in line 241 - Computing edges in line 249-266 - Helper functions in line 268-286 - Forward where supervised in line 288-315 - Forward where/what unsupervised in line 316-374 - Forward what supervised in line 375-399 - Backward for each discriminator in line 401-463 - Backward for generation parts in line 465-539 `coord_loss`: make sure that the whole compact instance is transformed. `stn_theta_loss`: preventing to predict too small objects or flipped objects For other losses you can understand what it is by its name.