# CVPR2018-papers **Repository Path**: kaluo_zZ/CVPR2018-papers ## Basic Information - **Project Name**: CVPR2018-papers - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-06-29 - **Last Updated**: 2021-06-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CVPR2018-papers 1. [Transductive Unbiased Embedding for Zero-Shot Learning](http://arxiv.org/abs/1803.11320v1) 2. [Frustum PointNets for 3D Object Detection from RGB-D Data](http://arxiv.org/abs/1711.08488v2) 3. [Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior](http://vclab.kaist.ac.kr/cvpr2018/CVPR2018_Stereo_SR.pdf) 4. [DiverseNet: When One Right Answer Is Not Enough](http://cs.bath.ac.uk/~nc537/papers/cvpr18_diversenet.pdf) 5. SSNet: Scale Selection Network for Online 3D Action Prediction 6. Very Large-Scale Global SfM by Distributed Motion Averaging 7. [PAD-Net: Multi-Tasks Guided Prediciton-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing](https://arxiv.org/abs/1805.04409) 8. Dynamic Feature Learning for Partial Face Recognition 9. [Context-aware Deep Feature Compression for High-speed Visual Tracking](http://arxiv.org/abs/1803.10537v1) 10. [Between-class Learning for Image Classification](http://arxiv.org/abs/1711.10284v2) 11. [DVQA: Understanding Data Visualizations via Question Answering](https://arxiv.org/abs/1801.08163) 12. Human Appearance Transfer 13. [Learning to Segment Every Thing](http://arxiv.org/abs/1711.10370v2) 14. Globally Optimal Inlier Set Maximization for Atlanta Frame Estimation 15. Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation 16. [Learning to Compare: Relation Network for Few-Shot Learning](http://arxiv.org/abs/1711.06025v2) 17. [Arbitrary Style Transfer with Deep Feature Reshuffle](http://arxiv.org/abs/1805.04103v2) 18. Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks 19. [Robust Video Content Alignment and Compensation for Rain Removal in a CNN Framework](http://arxiv.org/abs/1803.10433v1) 20. [Guided Proofreading of Automatic Segmentations for Connectomics](http://arxiv.org/abs/1704.00848v1) 21. Deep PhaseNet for Video Frame Interpolation 22. [Context-aware Synthesis for Video Frame Interpolation](http://arxiv.org/abs/1803.10967v1) 23. Lean Multiclass Crowdsourcing 24. Unsupervised Deep Generative Adversarial Hashing Network 25. [R-FCN-3000 at 30fps: Decoupling Detection and Classification](http://arxiv.org/abs/1712.01802v1) 26. [Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge](http://arxiv.org/abs/1708.02711v1) 27. [Gated Fusion Network for Single Image Dehazing](http://arxiv.org/abs/1804.00213v1) 28. [Learning a Complete Image Indexing Pipeline](http://arxiv.org/abs/1712.04480v1) 29. Mask-guided Contrastive Attention Model for Person Re-Identification 30. Learning Pose Specific Representations by Predicting different Views 31. [Deep Mutual Learning](http://arxiv.org/abs/1706.00384v1) 32. Improving Occlusion and Hard Negative Handling for Single-Stage Object Detectors 33. Defense against adversarial attacks using guided denoiser 34. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking 35. Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships 36. [Decorrelated Batch Normalization](http://arxiv.org/abs/1804.08450v1) 37. [On the Duality Between Retinex and Image Dehazing](http://arxiv.org/abs/1712.02754v2) 38. [CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes](http://arxiv.org/abs/1802.10062v4) 39. [The Perception-Distortion Tradeoff](http://arxiv.org/abs/1711.06077v2) 40. Image Blind Denoising With Generative Adversarial Network Based Noise Modeling 41. [Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning](http://arxiv.org/abs/1804.04450v2) 42. A Low Power, High Throughput, Fully Event-Based Stereo System 43. [Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present](http://arxiv.org/abs/1803.11439v2) 44. [End-to-end Flow Correlation Tracking with Spatial-temporal Attention](http://arxiv.org/abs/1711.01124v4) 45. Exploiting Transitivity for Learning Person Re-identification Models on a Budget 46. Imagination-IQA: No-reference Image Quality Assessment via Adversarial Learning 47. Egocentric Activity Recognition on a Budget 48. [Person Transfer GAN to Bridge Domain Gap for Person Re-Identification](http://arxiv.org/abs/1711.08565v1) 49. Duplex Generative Adversarial Network for Unsupervised Domain Adaptation 50. Fine-grained Video Captioning for Sports Narrative 51. High Performance Visual Tracking with Siamese Region Proposal Network 52. Adversarially Occluded Samples for Person Re-identification 53. MatNet: Modular Attention Network for Referring Expression Comprehension 54. [Low-Latency Video Semantic Segmentation](http://arxiv.org/abs/1804.00389v1) 55. MapNet: An Allocentric Spatial Memory for Mapping Environments 56. [Fast End-to-End Trainable Guided Filter](http://arxiv.org/abs/1803.05619v1) 57. [Partial Transfer Learning with Selective Adversarial Networks](http://arxiv.org/abs/1707.07901v1) 58. [Reconstruction Network for Video Captioning](http://arxiv.org/abs/1803.11438v1) 59. [Improving Landmark Localization with Semi-Supervised Learning](http://arxiv.org/abs/1709.01591v5) 60. Unsupervised Person Image Synthesis in Arbitrary Poses 61. Efficient Large-scale Approximate Nearest Neighbor Search on OpenCL FPGA 62. Deep End-to-End Time-of-Flight Imaging 63. Augmenting Crowd-Sourced 3D Reconstructions using Semantic Detections 64. DocUNet: Document Image Unwarping via A Stacked U-Net 65. Geometry Aware Optimization for Deep Learning: The Good Practice 66. Learning to Detect Features in Texture Images 67. [LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation](http://arxiv.org/abs/1805.07036v1) 68. [Spatially-Adaptive Filter Units for Deep Neural Networks](http://arxiv.org/abs/1711.11473v2) 69. [Revisiting Video Saliency: A Large-scale Benchmark and a New Model](http://arxiv.org/abs/1801.07424v2) 70. [Real-World Repetition Estimation by Div, Grad and Curl](http://arxiv.org/abs/1802.09971v1) 71. Learning Visual Knowledge Memory Networks for Visual Question Answering 72. Attention-aware Compositional Network for Person Re-Identification 73. [Sim2Real View Invariant Visual Servoing by Recurrent Control](http://arxiv.org/abs/1712.07642v1) 74. Time-resolved Light Transport Decomposition for Thermal Photometric Stereo 75. Trapping Light for Time of Flight 76. [A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation](http://arxiv.org/abs/1804.01306v1) 77. [Global versus Localized Generative Adversarial Nets](http://arxiv.org/abs/1711.06020v2) 78. [Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions](http://arxiv.org/abs/1711.08141v2) 79. Learning a Toolchain for Image Restoration 80. [CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition](http://arxiv.org/abs/1712.01056v2) 81. Feature Quantization for Defending Against Distortion of Images 82. [A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds](http://arxiv.org/abs/1803.07191v1) 83. [Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation](http://arxiv.org/abs/1803.04907v1) 84. [Aperture Supervision for Monocular Depth Estimation](http://arxiv.org/abs/1711.07933v2) 85. Divide and Conquer for Full-Resolution Light Field Deblurring 86. [Multi-shot Pedestrian Re-identification via Sequential Decision Making](http://arxiv.org/abs/1712.07257v2) 87. Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features 88. Depth-Aware Stereo Video Retargeting 89. Multistage Adversarial Losses for Pose-Based Human Image Synthesis 90. [Multi-Content GAN for Few-Shot Font Style Transfer](http://arxiv.org/abs/1712.00516v1) 91. Multi-Cue Correlation Filters for Robust Visual Tracking 92. [A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects](http://arxiv.org/abs/1709.05437v2) 93. [Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering](http://arxiv.org/abs/1707.07998v3) 94. Improving Color Reproduction Accuracy in the Camera Imaging Pipeline 95. [Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks](http://arxiv.org/abs/1801.03454v2) 96. [Sketch-a-Classifier: Sketch-based Photo Classifier Generation](http://arxiv.org/abs/1804.11182v1) 97. [Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks](http://arxiv.org/abs/1706.00046v3) 98. [TOM-Net: Learning Transparent Object Matting from a Single Image](http://arxiv.org/abs/1803.04636v3) 99. Estimation of Camera Locations in Highly Corrupted Scenarios: All About the Base, No Shape Trouble 100. [Direction-aware Spatial Context Features for Shadow Detection](http://arxiv.org/abs/1712.04142v2) 101. [Neural Motifs: Scene Graph Parsing with Global Context](http://arxiv.org/abs/1711.06640v2) 102. [Object Referring in Videos with Language and Human Gaze](http://arxiv.org/abs/1801.01582v2) 103. [Learning Transferable Architectures for Scalable Image Recognition](http://arxiv.org/abs/1707.07012v4) 104. [View Extrapolation of Human Body from a Single Image](http://arxiv.org/abs/1804.04213v1) 105. [Probabilistic Plant Modeling via Multi-View Image-to-Image Translation](http://arxiv.org/abs/1804.09404v1) 106. [Learning a Discriminative Prior for Blind Image Deblurring](http://arxiv.org/abs/1803.03363v2) 107. Optimal Structured Light a la Carte 108. [Revisiting Deep Intrinsic Image Decompositions](http://arxiv.org/abs/1701.02965v4) 109. GAGAN: Geometry Aware Generative Adverserial Networks 110. Learning Multi-grid Generative ConvNets by Minimal Contrastive Divergence 111. [Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization](http://arxiv.org/abs/1712.01034v2) 112. [Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification](http://arxiv.org/abs/1803.09882v1) 113. [Variational Autoencoders for Deforming 3D Mesh Models](http://arxiv.org/abs/1709.04307v3) 114. [Rotation Averaging and Strong Duality](http://arxiv.org/abs/1705.01362v2) 115. 3D Hand Pose Estimation: From Current Achievements to Future Goals 116. [Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions](http://arxiv.org/abs/1707.09092v3) 117. A Robust Generative Framework for Generalized Zero-Shot Learning 118. Two can play this Game: Visual Dialog with Discriminative Visual Question Generation and Visual Question Answering 119. Rotation-sensitive Regression for Oriented Scene Text Detection 120. [Adversarial Feature Augmentation for Unsupervised Domain Adaptation](http://arxiv.org/abs/1711.08561v2) 121. [Deep Regression Forests for Age Estimation](http://arxiv.org/abs/1712.07195v1) 122. [FOTS: Fast Oriented Text Spotting with a Unified Network](http://arxiv.org/abs/1801.01671v2) 123. SoS-RSC: A Sum-of-Squares Polynomial Approach to Robustifying Subspace Clustering Algorithms 124. [Efficient Subpixel Refinement with Symbolic Linear Predictors](http://arxiv.org/abs/1804.10750v1) 125. Self-Supervised Feature Learning by Learning to Spot Artifacts 126. [PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation](http://arxiv.org/abs/1711.10871v1) 127. [Scale-recurrent Network for Deep Image Deblurring](http://arxiv.org/abs/1802.01770v1) 128. Multi-Cell Classification by Convolutional Dictionary Learning with Class Proportion Priors 129. [Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks](http://arxiv.org/abs/1803.10892v1) 130. On the convergence of PatchMatch and its variants 131. Clinical Skin Lesion Diagnosis using Representations Inspired by Dermatologist Criteria 132. PoTion: Pose MoTion Representation for Action Recognition 133. [Zigzag Learning for Weakly Supervised Object Detection](http://arxiv.org/abs/1804.09466v1) 134. [VITAL: VIsual Tracking via Adversarial Learning](http://arxiv.org/abs/1804.04273v1) 135. Crowd Counting with Deep Negative Correlation Learning 136. [Multi-Label Zero-Shot Learning with Structured Knowledge Graphs](http://arxiv.org/abs/1711.06526v1) 137. Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition 138. [A Closer Look at Spatiotemporal Convolutions for Action Recognition](http://arxiv.org/abs/1711.11248v3) 139. [Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification](http://arxiv.org/abs/1803.09937v1) 140. End-to-End Deep Kronecker-Product Matching for Person Re-identification 141. Consensus Maximization for Semantic Region Correspondences 142. [SBNet: Sparse Block’s Network for Fast Inference](http://arxiv.org/abs/1801.02108v1) 143. [Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints](http://arxiv.org/abs/1706.00699v2) 144. Group Consistent Similarity Learning via Deep CRFs for Person Re-Identification 145. Now You Shake Me: Towards Automatic 4D Cinema 146. Defocus Blur Detection via Multi-Stream Bottom-Top-Bottom Fully Convolutional Network 147. Interpret Neural Networks by Identifying Critical Data Routing Paths 148. Deep Reinforcement Learning of Region Proposal Networks for Object Detection 149. [Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics](http://arxiv.org/abs/1705.07115v3) 150. Finding It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video" 151. Semantic Visual Localization 152. [DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks](http://arxiv.org/abs/1711.07064v4) 153. Composing Two Objects of Interest for Flying Camera Photography 154. Kernelized Subspace Pooling for Deep Local Descriptors 155. [Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks](http://arxiv.org/abs/1709.07592v3) 156. [Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks](http://arxiv.org/abs/1801.09335v1) 157. Deep Lesion Graph in the Wild: Relationship Learning and Organization of Significant Radiology Image Findings in a Diverse Large-scale Lesion Database 158. An Efficient and Provable Approach for Mixture Proportion Estimation Using Linear Independence Assumption 159. Eliminating Background-bias for Robust Person Re-identification 160. Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View 161. High-order tensor regularization with application to attribute ranking 162. [Taskonomy: Disentangling Task Transfer Learning](http://arxiv.org/abs/1804.08328v1) 163. [BlockDrop: Dynamic Inference Paths in Residual Networks](http://arxiv.org/abs/1711.08393v3) 164. [Attend and Interact: Higher-Order Object Interactions for Video Understanding](http://arxiv.org/abs/1711.06330v2) 165. Bilateral Ordinal Relevance Multi-instance Regression for Facial Action Unit Intensity Estimation 166. CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles 167. [Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification](http://arxiv.org/abs/1803.09786v1) 168. Large Scale Fine-Grained Categorization and the Effectiveness of Domain-Specific Transfer Learning 169. [BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning](http://arxiv.org/abs/1711.06959v1) 170. Improved Human Pose Estimation through Adversarial Data Augmentation 171. [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](http://arxiv.org/abs/1707.01083v2) 172. SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation 173. [Structured Uncertainty Prediction Networks](http://arxiv.org/abs/1802.07079v2) 174. Geometry-Guided CNN for Self-supervised Video Representation learning 175. Low-Shot Recognition with Imprinted Weights 176. [Self-supervised Learning of Geometrically Stable Features Through Probabilistic Introspection](http://arxiv.org/abs/1804.01552v1) 177. Disentangling Structure and Aesthetics for Content-aware Image Completion 178. A Volumetric Descriptive Network for 3D Object Synthesis 179. Interpretable Convolutional Neural Networks 180. Single Image Dehazing via Conditional Generative Adversarial Network 181. Neural Inverse Kinematics for Unsupervised Motion Retargetting 182. Environment Upgrade Reinforcement Learning for Non-differentiable Multi-stage Pipelines 183. [Teaching Categories to Human Learners with Visual Explanations](http://arxiv.org/abs/1802.06924v1) 184. [Facelet-Bank for Fast Portrait Manipulation](http://arxiv.org/abs/1803.05576v3) 185. [Convolutional Sequence to Sequence Model for Human Dynamics](http://arxiv.org/abs/1805.00655v1) 186. [Human Semantic Parsing for Person Re-identification](http://arxiv.org/abs/1804.00216v1) 187. [Latent RANSAC](http://arxiv.org/abs/1802.07045v1) 188. LiDAR-Video Driving Dataset: Learning Driving Policies Effectively 189. [Actor and Observer: Joint Modeling of First and Third-Person Videos](http://arxiv.org/abs/1804.09627v1) 190. Controllable Video Generation with Sparse Trajectories 191. [What have we learned from deep representations for action recognition?](http://arxiv.org/abs/1801.01415v1) 192. [Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning](http://arxiv.org/abs/1804.00100v2) 193. Language-Based Image Editing with Recurrent attentive Models 194. [Graph-Cut RANSAC](http://arxiv.org/abs/1706.00984v2) 195. [Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition](http://arxiv.org/abs/1707.08630v2) 196. [Memory Based Online Learning of Deep Representations from Video Streams](http://arxiv.org/abs/1711.07368v1) 197. [Deep Layer Aggregation](http://arxiv.org/abs/1707.06484v2) 198. [Learning Convolutional Networks for Content-weighted Image Compression](http://arxiv.org/abs/1703.10553v2) 199. [Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250Hz](http://arxiv.org/abs/1712.02859v2) 200. [Efficient, sparse representation of manifold distance matrices for classical scaling](http://arxiv.org/abs/1705.10887v2) 201. [Visual to Sound: Generating Natural Sound for Videos in the Wild](http://arxiv.org/abs/1712.01393v1) 202. A Prior-Less Method for Multi-Face Tracking in Unconstrained Videos 203. [Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks](http://arxiv.org/abs/1804.06039v1) 204. Self-calibrating polarising radiometric calibration 205. Pix3D: Dataset and Methods for 3D Object Modeling from a Single Image 206. Learning to Promote Saliency Detectors 207. Pose Transferrable Person Re-Identification 208. [Hashing as Tie-Aware Learning to Rank](http://arxiv.org/abs/1705.08562v3) 209. Baseline Desensitizing In Translation Averaging 210. [Conditional Image-to-Image Translation](http://arxiv.org/abs/1805.00251v1) 211. [Blind Predicting Similar Quality Map for Image Quality Assessment](http://arxiv.org/abs/1805.08493v1) 212. [Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?](http://arxiv.org/abs/1711.09577v2) 213. CNN Driven Sparse Multi-Level B-spline Image Registration 214. Through-Wall Human Pose Estimation Using Radio Signals 215. [xUnit: Learning a Spatial Activation Function for Efficient Image Restoration](http://arxiv.org/abs/1711.06445v3) 216. CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization 217. FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds 218. Weakly Supervised Coupled Networks for Visual Sentiment Analysis 219. [Ring loss: Convex Feature Normalization for Face Recognition](http://arxiv.org/abs/1803.00130v1) 220. [Fast Spectral Ranking for Similarity Search](http://arxiv.org/abs/1703.06935v3) 221. [PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning](http://arxiv.org/abs/1711.05769v2) 222. [AMNet: Memorability Estimation with Attention](http://arxiv.org/abs/1804.03115v1) 223. Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-grained Classification 224. [End-to-End Learning of Motion Representation for Video Understanding](http://arxiv.org/abs/1804.00413v1) 225. [Smooth Neighbors on Teacher Graphs for Semi-supervised Learning](http://arxiv.org/abs/1711.00258v2) 226. SeedNet : Automatic Seed Generation with Deep Reinforcement Learning for Robust Interactive Segmentation 227. Deep Spatio-Temporal Random Fields for Efficient Video Segmentation 228. Perturbative Neural Networks: Rethinking Convolution in CNNs 229. SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks 230. [Neural 3D Mesh Renderer](http://arxiv.org/abs/1711.07566v1) 231. Deep Parametric Continuous Convolutional Neural Networks 232. [Visual Question Reasoning on General Dependency Tree](http://arxiv.org/abs/1804.00105v1) 233. Non-local Neural Networks 234. Light field intrinsics with a deep encoder-decoder network 235. [Feature Space Transfer for Data Augmentation](http://arxiv.org/abs/1801.04356v2) 236. [Motion Segmentation by Exploiting Complementary Geometric Models](http://arxiv.org/abs/1804.02142v1) 237. Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation 238. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation 239. [Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks](http://arxiv.org/abs/1611.05827v3) 240. [Few-Shot Image Recognition by Predicting Parameters from Activations](http://arxiv.org/abs/1706.03466v3) 241. Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation 242. CLEAR: Cumulative LEARning for One-Shot One-Class Image Recognition 243. [Pose-Robust Face Recognition via Deep Residual Equivariant Mapping](http://arxiv.org/abs/1803.00839v1) 244. [Deep Cross-media Knowledge Transfer](http://arxiv.org/abs/1803.03777v1) 245. [Large-scale Point Cloud Semantic Segmentation with Superpoint Graphs](http://arxiv.org/abs/1711.09869v2) 246. [A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos](http://arxiv.org/abs/1802.08722v3) 247. Recurrent Slice Networks for 3D Segmentation on Point Clouds 248. Dimensionalitys Blessing: Detecting the distributions underlying images 249. [Augmented Skeleton Space Transfer for Depth-based Hand Pose Estimation](http://arxiv.org/abs/1805.04497v1) 250. [Robust Classification with Convolutional Prototype Learning](http://arxiv.org/abs/1805.03438v1) 251. [DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation](http://arxiv.org/abs/1712.06679v2) 252. ICE-BA: Efficient, Consistent and Efficient Bundle Adjustment for Visual-Inertial SLAM 253. [Grounding Referring Expressions in Images by Variational Context](http://arxiv.org/abs/1712.01892v2) 254. [Pseudo-Mask Augmented Object Detection](http://arxiv.org/abs/1803.05858v2) 255. [Improvements to context based self-supervised learning](http://arxiv.org/abs/1711.06379v3) 256. [Left-Right Comparative Recurrent Model for Stereo Matching](http://arxiv.org/abs/1804.00796v1) 257. [Learning deep structured active contours end-to-end](http://arxiv.org/abs/1803.06329v1) 258. [Efficient and Deep Person Re-Identification using Multi-Level Similarity](http://arxiv.org/abs/1803.11353v2) 259. [Learning Intrinsic Image Decomposition from Watching the World](http://arxiv.org/abs/1804.00582v1) 260. Learning to Understand Image Blur 261. Gaze Prediction in Dynamic $360^\circ$ Immersive Videos 262. Emotional Attention: A Study of Image Sentiment and Visual Attention 263. [Single View Stereo Matching](http://arxiv.org/abs/1803.02612v2) 264. [Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs](http://arxiv.org/abs/1803.08035v2) 265. [Video Representation Learning Using Discriminative Pooling](http://arxiv.org/abs/1803.10628v2) 266. Probabilistic Joint Face-Skull Modelling for Facial Reconstruction 267. Indoor RGB-D Compass from a Single Line and Plane 268. pOSE: Pseudo Object Space Error for Initialization-Free Bundle Adjustment 269. Generative Adversarial Learning Towards Fast Weakly Supervised Detection 270. Seeing Temporal Modulation of Lights from Standard Cameras 271. [Shape from Shading through Shape Evolution](http://arxiv.org/abs/1712.02961v1) 272. [Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries](http://arxiv.org/abs/1711.06370v1) 273. Neural Style Transfer via Meta Networks 274. [UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition](http://arxiv.org/abs/1712.04695v1) 275. [Cascaded Pyramid Network for Multi-Person Pose Estimation](http://arxiv.org/abs/1711.07319v2) 276. [Detect-and-Track: Efficient Pose Estimation in Videos](http://arxiv.org/abs/1712.09184v2) 277. SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion 278. [NAG: Network for Adversary Generation](http://arxiv.org/abs/1712.03390v2) 279. Inferring Co-Attention in Social Scene Videos 280. Unsupervised Learning of Single View Depth Estimation and Visual Odometry with Deep Feature Reconstruction 281. [Egocentric Basketball Motion Planning from a Single First-Person Image](http://arxiv.org/abs/1803.01413v1) 282. [Geometric robustness of deep networks: analysis and improvement](http://arxiv.org/abs/1711.09115v1) 283. Pose-Guided Photorealistic Face Rotation 284. [Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation](http://arxiv.org/abs/1712.00080v1) 285. [Importance Weighted Adversarial Nets for Partial Domain Adaptation](http://arxiv.org/abs/1803.09210v2) 286. [Towards High Performance Video Object Detection](http://arxiv.org/abs/1711.11577v1) 287. SurfConv: Bridging 3D and 2D Convolution for RGBD Images 288. [People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting](http://arxiv.org/abs/1711.05586v1) 289. Fully Convolutional Adaptation Networks for Semantic Segmentation 290. Towards Pose Invariant Face Recognition in the Wild 291. Interactive Image Segmentation with Latent Diversity 292. [Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images](http://arxiv.org/abs/1709.01993v1) 293. [Detecting and Recognizing Human-Object Interactions](http://arxiv.org/abs/1704.07333v3) 294. [Deep Image Prior](http://arxiv.org/abs/1711.10925v3) 295. [2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning](http://arxiv.org/abs/1802.09232v2) 296. Direct Shape Regression Networks for End-to-End Face Alignment 297. [Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition](http://arxiv.org/abs/1803.11366v1) 298. Scale-Transferrable Object Detection 299. [Learning by Asking Questions](http://arxiv.org/abs/1712.01238v1) 300. [3D Pose Estimation and 3D Model Retrieval for Objects in the Wild](http://arxiv.org/abs/1803.11493v1) 301. Deep Progressive Reinforcement Learning for Skeleton-based Action Recognition 302. [Future Person Localization in First-Person Videos](http://arxiv.org/abs/1711.11217v2) 303. 3D-RCNN: Instance-level 3D Scene Understanding via Render-and-Compare 304. Manifold Learning in Quotient Spaces 305. [Image Correction via Deep Reciprocating HDR Transformation](http://arxiv.org/abs/1804.04371v1) 306. Focus Manipulation Detection via Photometric Histogram Analysis 307. [Density Adaptive Point Set Registration](http://arxiv.org/abs/1804.01495v1) 308. Multi-view Harmonized Bilinear Network for 3D Object Recognition 309. [SeGAN: Segmenting and Generating the Invisible](http://arxiv.org/abs/1703.10239v3) 310. [VizWiz Grand Challenge: Answering Visual Questions from Blind People](http://arxiv.org/abs/1802.08218v4) 311. Sparse, Smart Contours to Represent and Edit Images 312. Generative Non-Rigid Shape Completion with Graph Convolutional Autoencoders 313. The power of ensembles for active learning in image classification 314. [OLÉ: Orthogonal Low-rank Embedding, A Plug and Play Geometric Loss for Deep Learning](http://arxiv.org/abs/1712.01727v1) 315. [Learning Compositional Visual Concepts with Mutual Consistency](http://arxiv.org/abs/1711.06148v2) 316. [Adversarial Complementary Learning for Weakly Supervised Object Localization](http://arxiv.org/abs/1804.06962v1) 317. [Analytical Modeling of Vanishing Points and Curves in Catadioptric Cameras](http://arxiv.org/abs/1804.09460v1) 318. Exploit the Unknown Gradually:~ One-Shot Video-Based Person Re-Identification by Stepwise Learning 319. [Learning to Sketch with Shortcut Cycle Consistency](http://arxiv.org/abs/1805.00247v1) 320. [Domain Adaptive Faster R-CNN for Object Detection in the Wild](http://arxiv.org/abs/1803.03243v1) 321. Attentive Generative Adversarial Network for Raindrop Removal from A Single Image 322. [Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN](http://arxiv.org/abs/1803.04831v3) 323. Making Convolutional Networks Recurrent for Visual Sequence Learning 324. Multi-Task Adversarial Network for Disentangled Feature Learning 325. Fight ill-posedness with ill-posedness: Single-shot variational depth super-resolution from shading 326. [Zero-Shot Sketch-Image Hashing](http://arxiv.org/abs/1803.02284v1) 327. [Learning to Localize Sound Source in Visual Scenes](http://arxiv.org/abs/1803.03849v1) 328. [Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation](http://arxiv.org/abs/1803.11365v1) 329. [Semi-parametric Image Synthesis](http://arxiv.org/abs/1804.10992v1) 330. [Multi-scale Location-aware Kernel Representation for Object Detection](http://arxiv.org/abs/1804.00428v1) 331. W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection 332. [Generative Modeling using the Sliced Wasserstein Distance](http://arxiv.org/abs/1803.11188v1) 333. [MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses](http://arxiv.org/abs/1805.00652v1) 334. [Dynamic Video Segmentation Network](http://arxiv.org/abs/1804.00931v1) 335. [Learning a Discriminative Feature Network for Semantic Segmentation](http://arxiv.org/abs/1804.09337v1) 336. Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding 337. [Curve Reconstruction via the Global Statistics of Natural Curves](http://arxiv.org/abs/1711.03172v2) 338. [Single-Shot Refinement Neural Network for Object Detection](http://arxiv.org/abs/1711.06897v3) 339. [Density-aware Single Image De-raining using a Multi-stream Dense Network](http://arxiv.org/abs/1802.07412v1) 340. Learning Answer Embeddings for Visual Question Answering 341. [Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](http://arxiv.org/abs/1711.09550v1) 342. [Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network](http://arxiv.org/abs/1802.09655v1) 343. Learning from the Deep: A Revised Underwater Image Formation Model 344. Mean-Variance Loss for Deep Age Estimation from a Face 345. [Disentangled Person Image Generation](http://arxiv.org/abs/1712.02621v2) 346. [Deep Sparse Coding for Invariant Multimodal Halle Berry Neurons](http://arxiv.org/abs/1711.07998v1) 347. DeepMVS: Learning Multi-View Stereopsis 348. [Embodied Question Answering](http://arxiv.org/abs/1711.11543v2) 349. [Deflecting Adversarial Attacks with Pixel Deflection](http://arxiv.org/abs/1801.08926v3) 350. Dynamic-Structured Semantic Propagation Network 351. Integrated facial landmark localization and super-resolution of real-world very low resolution faces in arbitrary poses with GANs 352. A Two-Step Disentanglement Method 353. [Towards Effective Low-bitwidth Convolutional Neural Networks](http://arxiv.org/abs/1711.00205v2) 354. [Natural and Effective Obfuscation by Head Inpainting](http://arxiv.org/abs/1711.09001v5) 355. Learning-Compression" algorithms for neural net pruning" 356. Salient Object Detection Driven by Fixation Prediction 357. [Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective](http://arxiv.org/abs/1803.00233v2) 358. Uncalibrated Photometric Stereo under Natural Illumination 359. Learning Monocular 3D Human Pose estimation on weakly-supervised Multi-view Images 360. [An Unsupervised Learning Model for Deformable Medical Image Registration](http://arxiv.org/abs/1802.02604v3) 361. Learning Deep Correspondence through Prior and Posterior Feature Constancy 362. [Anticipating Traffic Accidents with Adaptive Loss and Large-scale Incident DB](http://arxiv.org/abs/1804.02675v1) 363. [A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping](http://arxiv.org/abs/1709.04595v3) 364. Learned Shape-Tailored Descriptors for Segmentation 365. One-shot Action Localization by Sequence Matching Network 366. Robust Physical-World Attacks on Deep Learning Visual Classification 367. What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets 368. Bidirecional Retrieval Made Simple 369. Reward Learning by Instruction 370. [MegaDepth: Learning Single-View Depth Prediction from Internet Photos](http://arxiv.org/abs/1804.00607v1) 371. Cross-Dataset Adaptation for Visual Question Answering 372. Interpretable Video Captioning via Trajectory Structured Localization 373. [MoCoGAN: Decomposing Motion and Content for Video Generation](http://arxiv.org/abs/1707.04993v2) 374. Left/Right Asymmetric Layer Skippable Networks 375. [Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation](http://arxiv.org/abs/1803.10464v2) 376. [Unsupervised Discovery of Object Landmarks as Structural Representations](http://arxiv.org/abs/1804.04412v1) 377. Learning Deep Descriptors with Scale-Aware Triplet Networks 378. [Robust Depth Estimation from Auto Bracketed Images](http://arxiv.org/abs/1803.07702v1) 379. Aligning Infinite-Dimensional Covariance Matrices in Reproducing Kernel Hilbert Spaces for Domain Adaptation 380. Local and Global Optimization Techniques in Graph-based Clustering 381. [Learning from Millions of 3D Scans for Large-scale 3D Face Recognition](http://arxiv.org/abs/1711.05942v2) 382. [CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation](http://arxiv.org/abs/1804.01967v1) 383. Image Collection Pop-up: 3D Reconstruction and Clustering of Rigid and Non-Rigid Categories 384. [Ordinal Depth Supervision for 3D Human Pose Estimation](http://arxiv.org/abs/1805.04095v1) 385. Learning to Hash by Discrepancy Minimization 386. MapNet: Geometry-Aware Learning of Maps for Camera Localization 387. [Im2Struct: Recovering 3D Shape Structure from a Single RGB Image](http://arxiv.org/abs/1804.05469v1) 388. [A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking](http://arxiv.org/abs/1711.10378v2) 389. Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input 390. Cross-Domain Self-supervised Multi-task Feature Learning Using Synthetic Game Imagery 391. Coding Kendall's Shape Trajectories for 3D Action Recognition 392. Camera Pose Estimation with Unknown Principal Point 393. [Learning Spatial-Aware Regressions for Visual Tracking](http://arxiv.org/abs/1706.07457v2) 394. The Easy, The Medium and The Hard: Adapting Across Varied Domain Shifts 395. [Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation](http://arxiv.org/abs/1705.01314v4) 396. A Hybrid L1-L0 Layer Decomposition Model for Tone Mapping 397. [LIME: Live Intrinsic Material Estimation](http://arxiv.org/abs/1801.01075v2) 398. Learning Representations for Single Cells in Microscopy Images 399. Transparency by Design: Closing the Gap Between Performance and Interpretabilty in Visual Reasoning 400. [clcNet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions](http://arxiv.org/abs/1712.06145v3) 401. Spanning Patches: Deep Patch Selection for Fast Multi-View Stereo 402. LAMV: Learning to align and match videos with kernelized temporal layers 403. Single Image Reflection Separation with Perceptual Losses 404. [Structure from Recurrent Motion: From Rigidity to Recurrency](http://arxiv.org/abs/1804.06510v1) 405. [Customized Image Narrative Generation via Interactive Visual Question Generation and Answering](http://arxiv.org/abs/1805.00460v1) 406. [Relation Networks for Object Detection](http://arxiv.org/abs/1711.11575v1) 407. An End-to-End TextSpotter with Explicit Alignment and Attention 408. [Photometric Stereo in Participating Media Considering Shape-Dependent Forward Scatter](http://arxiv.org/abs/1804.02836v2) 409. [Sliced Wasserstein Distance for Learning Gaussian Mixture Models](http://arxiv.org/abs/1711.05376v2) 410. Generative Adversarial Image Synthesis with Decision Tree Latent Controller 411. [Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment](http://arxiv.org/abs/1802.06713v3) 412. Learning Multi-Instance Enriched Image Representation via Non-Greedy Simultaneous L1 -Norm Minimization and Maximization 413. [Separating Self-Expression and Visual Content in Hashtag Supervision](http://arxiv.org/abs/1711.09825v1) 414. [Residual Dense Network for Image Super-Resolution](http://arxiv.org/abs/1802.08797v2) 415. Hand PointNet: 3D Hand Pose Estimation using Point Sets 416. Human-centric Indoor Scene Synthesis Using Stochastic Grammar 417. Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering 418. [Occlusion Aware Unsupervised Learning of Optical Flow](http://arxiv.org/abs/1711.05890v2) 419. Domain Generalization with Adversarial Feature Learning 420. A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation 421. [PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image](http://arxiv.org/abs/1804.06278v1) 422. Deep Learning under Privileged Information Using Heteroscedastic Dropout 423. [Frame-Recurrent Video Super-Resolution](http://arxiv.org/abs/1801.04590v4) 424. [Nonlocal Low-Rank Tensor Factor Analysis for Image Restoration](http://arxiv.org/abs/1803.06795v1) 425. Content-Sensitive Supervoxels via Uniform Tessellations on Video Manifolds 426. Planar Shape Detection at Structural Scales 427. [Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking](http://arxiv.org/abs/1803.11285v1) 428. Learning to Parse Wireframes in Images of Man-Made Environments 429. Harmonious Attention Network for Person Re-Identication 430. [Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks](http://arxiv.org/abs/1711.06753v4) 431. Every Smile is Unique: Landmark-guided Diverse Smile Generation 432. Multi-Scale Weighted Nuclear Norm Image Restoration 433. [FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis](http://arxiv.org/abs/1706.05206v2) 434. Lightweight Probabilistic Deep Networks 435. [Learning Depth from Monocular Videos using Direct Methods](http://arxiv.org/abs/1712.00175v1) 436. [Thoracic Disease Identification and Localization with Limited Supervision](http://arxiv.org/abs/1711.06373v5) 437. [SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation](http://arxiv.org/abs/1711.08588v1) 438. [Memory Matching Networks for One-Shot Image Recognition](http://arxiv.org/abs/1804.08281v1) 439. [Compressed Video Action Recognition](http://arxiv.org/abs/1712.00636v2) 440. [FFNet: Video Fast-Forwarding via Reinforcement Learning](http://arxiv.org/abs/1805.02792v1) 441. Representing and Learning High Dimensional Data with the Optimal Transport Map from a Probabilistic Viewpoint 442. [ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans](http://arxiv.org/abs/1712.10215v2) 443. Fully Convolutional Attention Network for Multimodal Reasoning 444. Lions and Tigers and Bears: Capturing Non-Rigid, 3D, Articulated Shape from Images 445. [Recurrent Pixel Embedding for Instance Grouping](http://arxiv.org/abs/1712.08273v1) 446. Name-removed-for-review: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection 447. [SGAN: An Alternative Training of Generative Adversarial Networks](http://arxiv.org/abs/1712.02330v1) 448. [Learning Markov Clustering Networks for Scene Text Detection](http://arxiv.org/abs/1805.08365v1) 449. Occlusion-Aware Rolling Shutter Rectification of 3D Scenes 450. [Beyond Gröbner Bases: Basis Selection for Minimal Solvers](http://arxiv.org/abs/1803.04360v1) 451. [Improving Object Localization with Fitness NMS and Bounded IoU Loss](http://arxiv.org/abs/1711.00164v3) 452. Generative Adversarial Perturbations 453. Deep Photo Enhancer: Unsupervised Learning of Image Enhancement from Photographs with GANs 454. [Eye In-Painting with Exemplar Generative Adversarial Networks](http://arxiv.org/abs/1712.03999v1) 455. Encoder-Decoder Alignment for Zero-Pair Image-to-Image Translation 456. [Learning Structure and Strength of CNN Filters for Small Sample Size Training](http://arxiv.org/abs/1803.11405v1) 457. [Path Aggregation Network for Instance Segmentation](http://arxiv.org/abs/1803.01534v1) 458. Learning Superpixels with Segmentation-Aware Affinity Loss 459. [Data Distillation: Towards Omni-Supervised Learning](http://arxiv.org/abs/1712.04440v1) 460. Deep Diffeomorphic Transformer Networks 461. [CodeSLAM --- Learning a Compact, Optimisable Representation for Dense Visual SLAM](http://arxiv.org/abs/1804.00874v1) 462. [Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points](http://arxiv.org/abs/1802.07898v2) 463. [Learning Latent Super-Events to Detect Multiple Activities in Videos](http://arxiv.org/abs/1712.01938v2) 464. [MegDet: A Large Mini-Batch Object Detector](http://arxiv.org/abs/1711.07240v4) 465. [Lose The Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion](http://arxiv.org/abs/1711.10388v2) 466. Unsupervised Domain Adaptation with Similarity-Based Classifier 467. [Visual Feature Attribution using Wasserstein GANs](http://arxiv.org/abs/1711.08998v2) 468. Tell Me Where To Look: Guided Attention Inference Network 469. [Towards Open-Set Identity Preserving Face Synthesis](http://arxiv.org/abs/1803.11182v1) 470. [Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination](http://arxiv.org/abs/1805.01978v1) 471. Multi-Evidence Fusion and Filtering for Weakly Supervised Object Recognition, Detection and Segmentation 472. Deep Material-aware Cross-spectral Stereo Matching 473. MakeupGAN: Makeup Transfer via Cycle-Consistent Adversarial Networks 474. M3: Multimodal Memory Modelling for Video Captioning 475. [Fooling Vision and Language Models Despite Localization and Attention Mechanism](http://arxiv.org/abs/1709.08693v2) 476. [Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies](http://arxiv.org/abs/1801.01615v1) 477. [Jointly Localizing and Describing Events for Dense Video Captioning](http://arxiv.org/abs/1804.08274v1) 478. The Best of Both Worlds: Combining CNNs and Geometric Constraints for Hierarchical Motion Segmentation 479. [End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching](http://arxiv.org/abs/1802.07869v2) 480. [LDMNet: Low Dimensional Manifold Regularized Neural Networks](http://arxiv.org/abs/1711.06246v1) 481. [3D Human Pose Estimation in the Wild by Adversarial Learning](http://arxiv.org/abs/1803.09722v2) 482. Fast Video Object Segmentation by Reference-Guided Mask Propagation 483. [End-to-End Dense Video Captioning with Masked Transformer](http://arxiv.org/abs/1804.00819v1) 484. [Towards dense object tracking in a 2D honeybee hive](http://arxiv.org/abs/1712.08324v1) 485. [Appearance-and-Relation Networks for Video Classification](http://arxiv.org/abs/1711.09125v2) 486. StarGAN: Unified Generative Adversarial Networks for Controllable Multi-Domain Image-to-Image Translation 487. Answer with Grounding Snippets: Focal Visual-Text Attention for Visual Question Answering 488. GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB 489. Weakly Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer 490. [ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information](http://arxiv.org/abs/1704.02694v2) 491. [Structured Set Matching Networks for One-Shot Part Labeling](http://arxiv.org/abs/1712.01867v2) 492. [Real-Time Seamless Single Shot 6D Object Pose Prediction](http://arxiv.org/abs/1711.08848v4) 493. [Triplet-Center Loss for Multi-View 3D Object Retrieval](http://arxiv.org/abs/1803.06189v1) 494. [Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction](http://arxiv.org/abs/1804.06032v1) 495. Show Me a Story: Towards Coherent Neural Story Illustration 496. [DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map](http://arxiv.org/abs/1805.04949v1) 497. [Missing Slice Recovery for Tensors Using a Low-rank Model in Embedded Space](http://arxiv.org/abs/1804.01736v1) 498. [3D Semantic Segmentation with Submanifold Sparse Convolutional Networks](http://arxiv.org/abs/1711.10275v1) 499. [Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition](http://arxiv.org/abs/1712.05134v2) 500. [Link and code: Fast indexing with graphs and compact regression codes](http://arxiv.org/abs/1804.09996v2) 501. [Two-Stream Convolutional Networks for Dynamic Texture Synthesis](http://arxiv.org/abs/1706.06982v4) 502. [Weakly Supervised Action Localization by Sparse Temporal Pooling Network](http://arxiv.org/abs/1712.05080v2) 503. [Viewpoint-aware Video Summarization](http://arxiv.org/abs/1804.02843v2) 504. 4D Human Body Correspondences from Panoramic Depth Maps 505. [Tighter Lifting-Free Convex Relaxations for Quadratic Matching Problems](http://arxiv.org/abs/1711.10733v1) 506. Discovering Point Lights with Intensity Distance Fields 507. [The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks](http://arxiv.org/abs/1705.08790v2) 508. [Geometry-aware Deep Network for Single-Image Novel View Synthesis](http://arxiv.org/abs/1804.06008v1) 509. Temporal Deformable Residual Networks for Action Segmentation in Videos 510. [Seeing Small Faces from Robust Anchor's Perspective](http://arxiv.org/abs/1802.09058v1) 511. [Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers](http://arxiv.org/abs/1804.10975v1) 512. On the Importance of Label Quality for Semantic Segmentation 513. [AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions](http://arxiv.org/abs/1705.08421v4) 514. [First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations](http://arxiv.org/abs/1704.02463v2) 515. [Learning Deep Sketch Abstraction](http://arxiv.org/abs/1804.04804v1) 516. [Non-Linear Temporal Subspace Representations for Activity Recognition](http://arxiv.org/abs/1803.11064v1) 517. A Biresolution Spectral framework for Product Quantization 518. Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatio-temporal Patterns 519. Feature Super-Resolution: Make Machine See More Clearly 520. Finding Tiny Faces in the Wild with Generative Adversarial Network 521. DoubleFusion: Real-time Capture of Human Performance with Inner Body Shape from a Single Depth Sensor 522. [Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective](http://arxiv.org/abs/1803.10910v1) 523. [Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction](http://arxiv.org/abs/1801.03910v2) 524. Recognize Actions by Disentangling Components of Dynamics 525. [Who Let The Dogs Out? Modeling Dog Behavior From Visual Data](http://arxiv.org/abs/1803.10827v2) 526. [Alive Caricature from 2D to 3D](http://arxiv.org/abs/1803.06802v3) 527. [Learning Steerable Filters for Rotation Equivariant CNNs](http://arxiv.org/abs/1711.07289v3) 528. From source to target and back: Symmetric Bi-Directional Adaptive GAN 529. Monocular Relative Depth Perception with Web Stereo Data Supervision 530. [Correlation Tracking via Joint Discrimination and Reliability Learning](http://arxiv.org/abs/1804.08965v1) 531. [Boosting Domain Adaptation by Discovering Latent Domains](http://arxiv.org/abs/1805.01386v1) 532. HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization 533. [Learning from Noisy Web Data with Category-level Supervision](http://arxiv.org/abs/1803.03857v2) 534. Embodied Real-World Active Perception 535. [Boosting Self-Supervised Learning via Knowledge Transfer](http://arxiv.org/abs/1805.00385v1) 536. [Video Captioning via Hierarchical Reinforcement Learning](http://arxiv.org/abs/1711.11135v3) 537. Weakly Supervised Phrase Localization with Multi-Scale Anchored Transformer Network 538. Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection 539. [Wide Compression: Tensor Ring Nets](http://arxiv.org/abs/1802.09052v1) 540. Demo2Vec: Reasoning Object Affordances from Online Videos 541. A High-Quality Denoising Dataset for Smartphone Cameras 542. Collaborative and Adversarial Network for Unsupervised domain adaptation 543. [End-to-end weakly-supervised semantic alignment](http://arxiv.org/abs/1712.06861v2) 544. [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](http://arxiv.org/abs/1712.05877v1) 545. [Feature Selective Networks for Object Detection](http://arxiv.org/abs/1711.08879v1) 546. Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints 547. A Common Framework for Interactive Texture Transfer 548. Depth and Transient Imaging with Compressive SPAD Array Cameras 549. PointGrid: A Deep Network for 3D Shape Understanding 550. A Network Architecture for Point Cloud Classification via Automatic Depth Images Generation 551. Optimizing Local Feature Descriptors for Nearest Neighbor Matching 552. 4DFAB: A Large Scale 4D Database for Facial Expression Analysis and Biometric Applications 553. [Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains](http://arxiv.org/abs/1803.06641v1) 554. [Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network](http://arxiv.org/abs/1802.09178v2) 555. [Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal](http://arxiv.org/abs/1712.02478v1) 556. [What do Deep Networks Like to See?](http://arxiv.org/abs/1803.08337v1) 557. [On the Robustness of Semantic Segmentation Models to Adversarial Attacks](http://arxiv.org/abs/1711.09856v2) 558. [SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval](http://arxiv.org/abs/1804.01401v1) 559. Progressive Attention Guided Recurrent Network for Salient Object Detection 560. [IQA: Visual Question Answering in Interactive Environments](http://arxiv.org/abs/1712.03316v2) 561. [Boosting Adversarial Attacks with Momentum](http://arxiv.org/abs/1710.06081v3) 562. [Conditional Probability Models for Deep Image Compression](http://arxiv.org/abs/1801.04260v1) 563. [Cascade R-CNN: Delving into High Quality Object Detection](http://arxiv.org/abs/1712.00726v1) 564. [Scalable and Effective Deep CCA via Soft Decorrelation](http://arxiv.org/abs/1707.09669v2) 565. [Discriminability objective for training descriptive captions](http://arxiv.org/abs/1803.04376v1) 566. Going from Image to Video Saliency: Augmenting Image Salience with Dynamic Attentional Push 567. [Recurrent Scene Parsing with Perspective Understanding in the Loop](http://arxiv.org/abs/1705.07238v2) 568. [Semantic Video Segmentation by Gated Recurrent Flow Propagation](http://arxiv.org/abs/1612.08871v2) 569. [FlipDial: A Generative Model for Two-Way Visual Dialogue](http://arxiv.org/abs/1802.03803v2) 570. [Context Encoding for Semantic Segmentation](http://arxiv.org/abs/1803.08904v1) 571. Deep Marching Cubes: Learning Explicit Surface Representations 572. [Rethinking Feature Distribution for Loss Functions in Image Classification](http://arxiv.org/abs/1803.02988v1) 573. Optical Flow Guided Feature: A Motion Representation for Video Action Recognition 574. [Multimodal Explanations: Justifying Decisions and Pointing to the Evidence](http://arxiv.org/abs/1802.08129v1) 575. [HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification](http://arxiv.org/abs/1803.07913v1) 576. Imagine it for me: Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts 577. Co-Occurrence Template Matching 578. Defense against Universal Adversarial Perturbations 579. [PPFNet: Global Context Aware Local Features for Robust 3D Point Matching](http://arxiv.org/abs/1802.02669v2) 580. [Dynamic Zoom-in Network for Fast Object Detection in Large Images](http://arxiv.org/abs/1711.05187v2) 581. [Objects as context for detecting their semantic parts](http://arxiv.org/abs/1703.09529v3) 582. [Spline Error Weighting for Robust Visual-Inertial Fusion](http://arxiv.org/abs/1804.04820v1) 583. GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation 584. Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks 585. Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network 586. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net 587. [CondenseNet: An Efficient DenseNet using Learned Group Convolutions](http://arxiv.org/abs/1711.09224v1) 588. [Burst Denoising with Kernel Prediction Networks](http://arxiv.org/abs/1712.02327v2) 589. [Leveraging Unlabeled Data for Crowd Counting by Learning to Rank](http://arxiv.org/abs/1803.03095v1) 590. [Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation](http://arxiv.org/abs/1709.04518v4) 591. Classifier Learning with Prior Probabilities for Facial Action Unit Recognition 592. Active Fixation Control to Predict Saccade Sequences 593. Reflection Removal for Large-Scale 3D Point Clouds 594. Mesoscopic Facial Geometry inference Using Deep Neural Networks 595. [VITON: An Image-based Virtual Try-on Network](http://arxiv.org/abs/1711.08447v3) 596. [Beyond the Pixel-Wise Loss for Topology-Aware Delineation](http://arxiv.org/abs/1712.02190v1) 597. HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN 598. A Globally Optimal Solution to the Non-Minimal Relative Pose Problem 599. [Learning distributions of shape trajectories from longitudinal datasets: a hierarchical model on a manifold of diffeomorphisms](http://arxiv.org/abs/1803.10119v1) 600. [Multispectral Image Intrinsic Decomposition via Low Rank Constraint](http://arxiv.org/abs/1802.08793v1) 601. [Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams](http://arxiv.org/abs/1711.09528v1) 602. Alternating-Stereo VINS: Observability Analysis and Performance Evaluation 603. [Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View](http://arxiv.org/abs/1712.04569v1) 604. [Style Aggregated Network for Facial Landmark Detection](http://arxiv.org/abs/1803.04108v4) 605. [VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection](http://arxiv.org/abs/1711.06396v1) 606. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors 607. Deep Adversarial Subspace Clustering 608. [Compassionately Conservative Balanced Cuts for Image Segmentation](http://arxiv.org/abs/1803.09903v1) 609. [Deformable GANs for Pose-based Human Image Generation](http://arxiv.org/abs/1801.00055v2) 610. [Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration](http://arxiv.org/abs/1805.03857v2) 611. [The iNaturalist Species Classification and Detection Dataset](http://arxiv.org/abs/1707.06642v2) 612. Categorizing Concepts with Basic Level for Vision-to-Language 613. InverseFaceNet: Deep Monocular Inverse Face Rendering at over 250 Hz 614. Textbook Question Answering under Teacher Guidance with Memory Networks 615. [Learning to Find Good Correspondences](http://arxiv.org/abs/1711.05971v2) 616. Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning 617. [Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data](http://arxiv.org/abs/1803.05137v1) 618. Weakly Supervised Facial Action Unit Recognition through Adversarial Training 619. [Knowledge Aided Consistency for Weakly Supervised Phrase Grounding](http://arxiv.org/abs/1803.03879v1) 620. Neighbors Do Help: Deeply Exploiting Local Structures of Point Clouds 621. [The Unreasonable Effectiveness of Deep Features as a Perceptual Metric](http://arxiv.org/abs/1801.03924v2) 622. [Dense 3D Regression for Hand Pose Estimation](http://arxiv.org/abs/1711.08996v1) 623. [Detail-Preserving Pooling in Deep Networks](http://arxiv.org/abs/1804.04076v1) 624. Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation 625. Reinforcement Cutting-Agent Learning for Video Object Segmentation 626. [SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis](http://arxiv.org/abs/1801.02753v2) 627. Wrapped Gaussian Process Regression on Riemannian Manifolds 628. Document Enhancement using Visibility Detection 629. Learning Discriminative Evaluation Metrics for Image Captioning 630. GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning 631. [Learning Intelligent Dialogs for Bounding Box Annotation](http://arxiv.org/abs/1712.08087v2) 632. [Efficient Diverse Ensemble for Discriminative Co-Tracking](http://arxiv.org/abs/1711.06564v1) 633. Recovering Realistic Texture in Image Super-resolution by Spatial Feature Modulation 634. [Mining on Manifolds: Metric Learning without Labels](http://arxiv.org/abs/1803.11095v1) 635. [Revisiting knowledge transfer for training object class detectors](http://arxiv.org/abs/1708.06128v3) 636. [GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose](http://arxiv.org/abs/1803.02276v2) 637. [Differential Attention for Visual Question Answering](http://arxiv.org/abs/1804.00298v2) 638. A PID Controller Approach for Stochastic Optimization of Deep Networks 639. Bootstrapping the Performance of Webly Supervised Semantic Segmentation 640. [Iterative Learning with Open-set Noisy Labels](http://arxiv.org/abs/1804.00092v1) 641. A Papier-Mâché Approach to Learning 3D Surface Generation 642. Extreme 3D Face Reconstruction: Looking Past Occlusions 643. High-speed Tracking with Multi-kernel Correlation Filters 644. Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification 645. [Separating Style and Content for Generalized Style Transfer](http://arxiv.org/abs/1711.06454v5) 646. [Learning Dual Convolutional Neural Networks for Low-Level Vision](http://arxiv.org/abs/1805.05020v1) 647. [Wasserstein Introspective Neural Networks](http://arxiv.org/abs/1711.08875v5) 648. [Deep Semantic Face Deblurring](http://arxiv.org/abs/1803.03345v2) 649. [InLoc: Indoor Visual Localization with Dense Matching and View Synthesis](http://arxiv.org/abs/1803.10368v2) 650. Temporal Hallucinating for Action Recognition with Few Still Images 651. [Deep Texture Manifold for Ground Terrain Recognition](http://arxiv.org/abs/1803.10896v2) 652. [Discriminative Learning of Latent Features for Zero-Shot Recognition](http://arxiv.org/abs/1803.06731v1) 653. Neural Sign Language Translation 654. GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints 655. [Repulsion Loss: Detecting Pedestrians in a Crowd](http://arxiv.org/abs/1711.07752v2) 656. Pulling Actions out of Context: Explicit Separation for Effective Combination 657. Deep Group-shuffling Random Walk for Person Re-identification 658. DenseASPP: Densely Connected Networks for Semantic Segmentation 659. [A Variational U-Net for Conditional Appearance and Shape Generation](http://arxiv.org/abs/1804.04694v1) 660. Universal Denoising Networks : A Novel CNN-based Network Architecture for Image Denoising 661. Automatic 3D Indoor Scene Modeling from Single Panorama 662. [Five-point Fundamental Matrix Estimation for Uncalibrated Cameras](http://arxiv.org/abs/1803.00260v1) 663. [PU-Net: Point Cloud Upsampling Network](http://arxiv.org/abs/1801.06761v2) 664. [Generative Image Inpainting with Contextual Attention](http://arxiv.org/abs/1801.07892v2) 665. [Im2Flow: Motion Hallucination from Static Images for Action Recognition](http://arxiv.org/abs/1712.04109v1) 666. Tagging Like Humans: Diverse and Distinct Image Annotation 667. [TextureGAN: Controlling Deep Image Synthesis with Texture Patches](http://arxiv.org/abs/1706.02823v3) 668. ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing 669. [Optimizing Video Object Detection via a Scale-Time Lattice](http://arxiv.org/abs/1804.05472v1) 670. [Context Embedding Networks](http://arxiv.org/abs/1710.01691v3) 671. Motion-Guided Cascaded Refinement Network for Video Object Segmentation 672. [RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints](http://arxiv.org/abs/1603.06208v4) 673. Conditional Generative Adversarial Network for Structured Domain Adaptation 674. Large-scale Distance Metric Learning with Uncertainty 675. [Hierarchical Novelty Detection for Visual Object Recognition](http://arxiv.org/abs/1804.00722v1) 676. Deeper Look at Power Normalizations. 677. [Disentangling Factors of Variation by Mixing Them](http://arxiv.org/abs/1711.07410v2) 678. [Beyond Holistic Object Recognition: Enriching Image Understanding with Part States](http://arxiv.org/abs/1612.07310v1) 679. [LSTM Pose Machines](http://arxiv.org/abs/1712.06316v4) 680. [End-to-end Recovery of Human Shape and Pose](http://arxiv.org/abs/1712.06584v1) 681. [Geometric Multi-Model Fitting with a Convex Relaxation Algorithm](http://arxiv.org/abs/1706.01553v1) 682. [Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects](http://arxiv.org/abs/1803.05082v2) 683. Modulated Convolutional Networks 684. [High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs](http://arxiv.org/abs/1711.11585v1) 685. [Learning Compressible 360° Video Isomers](http://arxiv.org/abs/1712.04083v1) 686. Easy Identification from Better Constraints: Multi-Shot Person Re-Identification from Reference Constraints 687. [TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays](http://arxiv.org/abs/1801.04334v1) 688. Good View Hunting: Learning Photo Composition from 1 Million View Pairs 689. Visual Relationship Learning with a Factorization-based Prior 690. Min-Entropy Latent Model for Weakly Supervised Object Detection 691. [Boundary Flow: A Siamese Network that Predicts Boundary Motion without Training on Motion](http://arxiv.org/abs/1702.08646v3) 692. SfSNet : Learning Shape, Reflectance and Illuminance of Faces `in the wild' 693. Facial Expression Recognition by De-expression Residue Learning 694. Empirical study of the topology and geometry of deep networks 695. Learning Globally Optimized Object Detector via Policy Gradient 696. Learning from Synthetic Data: Semantic Segmentation using Generative Adversarial Networks 697. [Recurrent Residual Module for Fast Inference in Videos](http://arxiv.org/abs/1802.09723v1) 698. Viewpoint-aware Attentive Multi-view Inference for Vehicle Re-identification 699. Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing 700. Deep Adversarial Metric Learning 701. [Learning Deep Models for Face Anti-Spoofing: Binary or Auxiliary Supervision](http://arxiv.org/abs/1803.11097v1) 702. [Art of singular vectors and universal adversarial perturbations](http://arxiv.org/abs/1709.03582v2) 703. Free supervision from video games 704. Unifying Identification and Context Learning for Person Recognition 705. DensePose: Multi-Person Dense Human Pose Estimation In The Wild 706. End-to-end Convolutional Semantic Embeddings 707. [Convolutional Image Captioning](http://arxiv.org/abs/1711.09151v1) 708. [Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis](http://arxiv.org/abs/1801.05091v1) 709. [Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++](http://arxiv.org/abs/1803.09693v1) 710. [Nonlinear 3D Face Morphable Model](http://arxiv.org/abs/1804.03786v1) 711. [OATM: Occlusion Aware Template Matching by Consensus Set Maximization](http://arxiv.org/abs/1804.02638v1) 712. [Multi-Image Semantic Matching by Mining Consistent Features](http://arxiv.org/abs/1711.07641v2) 713. Explicit Loss-Error-Aware Quantization for Deep Neural Networks 714. Modeling Facial Geometry using Compositional VAEs 715. Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction 716. [DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion](http://arxiv.org/abs/1709.04577v2) 717. Attentional ShapeContextNet for Point Cloud Recognition 718. [Weakly Supervised Instance Segmentation using Class Peak Response](http://arxiv.org/abs/1804.00880v1) 719. Fast and Robust Estimation for Unit-Norm Constrained Linear Fitting Problems 720. [Maximum Classifier Discrepancy for Unsupervised Domain Adaptation](http://arxiv.org/abs/1712.02560v4) 721. [Multi-Level Factorisation Net for Person Re-Identification](http://arxiv.org/abs/1803.09132v2) 722. [Video Based Reconstruction of 3D People Models](http://arxiv.org/abs/1803.04758v3) 723. Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer 724. [Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks](http://arxiv.org/abs/1712.04407v1) 725. [Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering](http://arxiv.org/abs/1804.00775v1) 726. Image Super-resolution via Dual-state Recurrent Neural Networks 727. [Excitation Backprop for RNNs](http://arxiv.org/abs/1711.06778v3) 728. [Image Generation from Scene Graphs](http://arxiv.org/abs/1804.01622v1) 729. [Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking](http://arxiv.org/abs/1803.08679v1) 730. Image Restoration by Estimating Frequency Distribution of Local Patches 731. [Learning to Adapt Structured Output Space for Semantic Segmentation](http://arxiv.org/abs/1802.10349v1) 732. Deep Spatial Feature Reconstruction for Partial Person Re-identification 733. Tight Nonconvex Relaxation of MAP Inference 734. Multiple Granularity Group Interaction Prediction 735. Accurate and Diverse Sampling of Sequences based on a ``Best of Many'' Sample Objective 736. [Learning Rich Features for Image Manipulation Detection](http://arxiv.org/abs/1805.04953v1) 737. DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Network 738. A Benchmark for Articulated Human Pose Estimation and Tracking 739. [Preserving Semantic Relations for Zero-Shot Learning](http://arxiv.org/abs/1803.03049v1) 740. Geometry-Aware Scene Text Detection with Instance Transformation Network 741. [CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise](http://arxiv.org/abs/1711.07131v2) 742. [Joint Cuts and Matching of Partitions in One Graph](http://arxiv.org/abs/1711.09584v1) 743. Fast and Accurate Online Video Object Segmentation via Tracking Parts 744. Learning Nested Structures in Deep Neural Networks 745. [Practical Block-wise Neural Network Architecture Generation](http://arxiv.org/abs/1708.05552v3) 746. [AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation](http://arxiv.org/abs/1803.01599v1) 747. Modifying Non-Local Variations Across Multiple Views 748. [Connecting Pixels to Privacy and Utility: Automatic Redaction of Private Information in Images](http://arxiv.org/abs/1712.01066v1) 749. Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN 750. [When will you do what? - Anticipating Temporal Occurrences of Activities](http://arxiv.org/abs/1804.00892v1) 751. [Visual Question Answering with Memory-Augmented Networks](http://arxiv.org/abs/1707.04968v2) 752. [Stochastic Variational Inference with Gradient Linearization](http://arxiv.org/abs/1803.10586v1) 753. Human Pose Estimation with Parsing Induced Learner 754. [3D Registration of Curves and Surfaces using Local Differential Information](http://arxiv.org/abs/1804.00637v1) 755. [Deformation Aware Image Compression](http://arxiv.org/abs/1804.04593v1) 756. PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos 757. [MovieGraphs: Towards Understanding Human-Centric Situations from Videos](http://arxiv.org/abs/1712.06761v2) 758. Hybrid Camera Pose Estimation 759. [Fast Monte-Carlo Localization on Aerial Vehicles using Approximate Continuous Belief Representations](http://arxiv.org/abs/1712.05507v3) 760. [PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume](http://arxiv.org/abs/1709.02371v2) 761. Hierarchical Recurrent Attention Networks for Structured Online Maps 762. [Learning Less is More - 6D Camera Localization via 3D Surface Regression](http://arxiv.org/abs/1711.10228v2) 763. [Visual Question Generation as Dual Task of Visual Question Answering](http://arxiv.org/abs/1709.07192v1) 764. 3D Object Detection with Latent Support Surfaces 765. [An Analysis of Scale Invariance in Object Detection - SNIP](http://arxiv.org/abs/1711.08189v1) 766. [3D Semantic Trajectory Reconstruction from 3D Pixel Continuum](http://arxiv.org/abs/1712.01359v1) 767. KIPPI: KInetic Polygonal Partitioning of Images 768. [COCO-Stuff: Thing and Stuff Classes in Context](http://arxiv.org/abs/1612.03716v4) 769. [Joint Optimization Framework for Learning with Noisy Labels](http://arxiv.org/abs/1803.11364v1) 770. [Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks](http://arxiv.org/abs/1703.10114v1) 771. Deep Cost-Sensitive and Order-Preserving Feature Learning for Cross-Population Age Estimation 772. [Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments](http://arxiv.org/abs/1711.07280v3) 773. [Deep Back-Projection Networks For Super-Resolution](http://arxiv.org/abs/1803.02735v1) 774. [Generating a Fusion Image: One' s Identity and Another's Shape](http://arxiv.org/abs/1804.07455v1) 775. [V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map](http://arxiv.org/abs/1711.07399v2) 776. [Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty](http://arxiv.org/abs/1711.09026v1) 777. [Cross-modal Deep Variational Hand Pose Estimation](http://arxiv.org/abs/1803.11404v1) 778. [Learning to Estimate 3D Human Pose and Shape from a Single Color Image](http://arxiv.org/abs/1805.04092v1) 779. Video Rain Removal By Multiscale Convolutional Sparse Coding 780. Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning 781. Learning 3D Shape Completion from Point Clouds with Weak Supervision 782. [SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels](http://arxiv.org/abs/1711.08920v1) 783. Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display 784. Weakly-supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation 785. Rolling Shutter and Radial Distortion are Features for High Frame Rate Multi-camera Tracking 786. Robust Hough Transform Based 3D Reconstruction from Circular Light Fields 787. [Feedback-prop: Convolutional Neural Network Inference under Partial Evidence](http://arxiv.org/abs/1710.08049v2) 788. [Learning Strict Identity Mappings in Deep Residual Networks](http://arxiv.org/abs/1804.01661v3) 789. [Residual Parameter Transfer for Deep Domain Adaptation](http://arxiv.org/abs/1711.07714v1) 790. [Exploring Disentangled Feature Representation Beyond Face Identification](http://arxiv.org/abs/1804.03487v1) 791. [SPLATNet: Sparse Lattice Networks for Point Cloud Processing](http://arxiv.org/abs/1802.08275v4) 792. Unsupervised Training for 3D Morphable Model Regression 793. A Bi-directional Message Passing Model for Salient Object Detection 794. Learning to See in the Dark 795. Erase or Fill? Deep Joint Recurrent Rain Removal and Reconstruction in Videos 796. [Finding beans in burgers: Deep semantic-visual embedding with localization](http://arxiv.org/abs/1804.01720v2) 797. [Referring Relationships](http://arxiv.org/abs/1803.10362v2) 798. [Adversarially Learned One-Class Classifier for Novelty Detection](http://arxiv.org/abs/1802.09088v1) 799. Surface Networks 800. [Efficient parametrization of multi-domain deep neural networks](http://arxiv.org/abs/1803.10082v1) 801. Recognizing Human Actions as Evolution of Pose Estimation Maps 802. Soccer on Your Tabletop 803. CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization 804. Gesture Recognition: Focus on the Hands 805. Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF 806. [Real-world Anomaly Detection in Surveillance Videos](http://arxiv.org/abs/1801.04264v2) 807. [Learning a Single Convolutional Super-Resolution Network for Multiple Degradations](http://arxiv.org/abs/1712.06116v1) 808. [Iterative Visual Reasoning Beyond Convolutions](http://arxiv.org/abs/1803.11189v1) 809. [Guide Me: Interacting with Deep Networks](http://arxiv.org/abs/1803.11544v1) 810. [PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection](http://arxiv.org/abs/1708.06433v2) 811. [Future Frame Prediction for Anomaly Detection A New Baseline](http://arxiv.org/abs/1712.09867v3) 812. Structure Preserving Video Prediction 813. [Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks](http://arxiv.org/abs/1712.01928v2) 814. Captioning Images with Style Transfer from Unaligned Text Corpora 815. Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation 816. [Illuminant Spectra-based Source Separation Using Flash Photography](http://arxiv.org/abs/1704.05564v2) 817. 3D Human Pose Reconstruction and Action Classification in Robot Assisted Therapy of Children with Autism 818. [Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs](http://arxiv.org/abs/1705.05020v5) 819. [Classification Driven Dynamic Image Enhancement](http://arxiv.org/abs/1710.07558v3) 820. [Feature Generating Networks for Zero-Shot Learning](http://arxiv.org/abs/1712.00981v2) 821. Beyond Trade-off: Accelerate FCN-based Face Detection with Higher Accuracy 822. MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition 823. [Unsupervised Learning and Segmentation of Complex Activities from Video](http://arxiv.org/abs/1803.09490v1) 824. [Sparse Photometric 3D Face Reconstruction Guided by Morphable Models](http://arxiv.org/abs/1711.10870v1) 825. LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH) 826. Inverse Composition Discriminative Optimization for Point Cloud Registration 827. Inference in Higher Order MRF-MAP Problems with Small and Large Cliques 828. Look at Boundary: A Boundary-Aware Face Alignment Algorithm 829. [LEGO: Learning Edge with Geometry all at Once by Watching Videos](http://arxiv.org/abs/1803.05648v2) 830. [CosFace: Large Margin Cosine Loss for Deep Face Recognition](http://arxiv.org/abs/1801.09414v2) 831. [Learning Semantic Concepts and Order for Image and Sentence Matching](http://arxiv.org/abs/1712.02036v1) 832. [Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks](http://arxiv.org/abs/1709.00507v2) 833. [Low-shot learning with large-scale diffusion](http://arxiv.org/abs/1706.02332v2) 834. [Multimodal Visual Concept Learning with Weakly Supervised Techniques](http://arxiv.org/abs/1712.00796v3) 835. Cross-View Image Synthesis using Conditional Generative Adversarial Nets 836. Pixel-Wise Metric Learning for Blazingly Fast Video Object Segmentation 837. PieAPP: Perceptual Image-Error Assessment through Pairwise Preference 838. Cube Padding for Weakly-Supervised Saliency Prediction in 360$^{\circ}$ Videos 839. CRRN: Multi-Scale Guided Concurrent Reflection Removal Network 840. [Stereoscopic Neural Style Transfer](http://arxiv.org/abs/1802.10591v2) 841. Low-shot Learning from Imaginary Data 842. Fast, Simple, and Effective Resource-Constrained Structure Learning of Deep Networks 843. [Unsupervised Sparse Dirichlet-Net for Hyperspectral Image Super-Resolution](http://arxiv.org/abs/1804.05042v1) 844. Visual Grounding via Accumulated Attention 845. [Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars](http://arxiv.org/abs/1804.01310v1) 846. Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes 847. [Actor and Action Video Segmentation from a Sentence](http://arxiv.org/abs/1803.07485v1) 848. [AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks](http://arxiv.org/abs/1711.10485v1) 849. CartoonGAN: Generative Adversarial Networks for Photo Cartoonization 850. RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials 851. Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging 852. [Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation](http://arxiv.org/abs/1803.11029v1) 853. [Densely Connected Pyramid Dehazing Network](http://arxiv.org/abs/1803.08396v1) 854. Matching Adversarial Networks 855. Automatic Map Inference from Aerial Images 856. Polarimetric Dense Monocular SLAM 857. Learning Attribute Representations with Localization for Flexible Fashion Search 858. [Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval](http://arxiv.org/abs/1804.01223v1) 859. Unsupervised CCA 860. Analyzing Filters Toward Efficient ConvNet 861. Good Appearance Features for Multi-Target Multi-Camera Tracking 862. [Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning](http://arxiv.org/abs/1711.07613v1) 863. [Efficient Optimization for Rank-based Loss Functions](http://arxiv.org/abs/1604.08269v3) 864. [ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing](http://arxiv.org/abs/1803.01837v1) 865. [A Perceptual Measure for Deep Single Image Camera Calibration](http://arxiv.org/abs/1712.01259v3) 866. [Radially-Distorted Conjugate Translations](http://arxiv.org/abs/1711.11339v2) 867. Multi-task Learning by Maximizing Statistical Dependence 868. [Creating Capsule Wardrobes from Fashion Images](http://arxiv.org/abs/1712.02662v2) 869. Towards Human-Machine Cooperation: Evolving Active Learning with Self-supervised Process for Object Detection 870. [Synthesizing Images of Humans in Unseen Poses](http://arxiv.org/abs/1804.07739v1) 871. [Learning to Act Properly: Predicting and Explaining Affordances from Images](http://arxiv.org/abs/1712.07576v1) 872. [Pyramid Stereo Matching Network](http://arxiv.org/abs/1803.08669v1) 873. [Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene](http://arxiv.org/abs/1712.01812v2) 874. A General Two-Step Quantization Approach for Low-bit Neural Networks with High Accuracy 875. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition 876. [Convolutional Neural Networks with Alternately Updated Clique](http://arxiv.org/abs/1802.10419v3) 877. Squeeze-and-Excitation Networks 878. [NISP: Pruning Networks using Neuron Importance Score Propagation](http://arxiv.org/abs/1711.05908v3) 879. [Audio to Body Dynamics](http://arxiv.org/abs/1712.09382v1) 880. ID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis 881. Deep Learning of Graph Matching 882. [Neural Baby Talk](http://arxiv.org/abs/1803.09845v1) 883. [Efficient Video Object Segmentation via Network Modulation](http://arxiv.org/abs/1802.01218v1) 884. [Regularizing Deep Networks by Modeling and Predicting Label Structure](http://arxiv.org/abs/1804.02009v1) 885. [Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation](http://arxiv.org/abs/1805.04574v1) 886. Face Detector Adaptation without Negative Transfer or Catastrophic Forgetting 887. [Motion-Appearance Co-Memory Networks for Video Question Answering](http://arxiv.org/abs/1803.10906v1) 888. [Compare and Contrast: Learning Prominent Visual Differences](http://arxiv.org/abs/1804.00112v2) 889. Tangent Convolutions for Dense Prediction in 3D 890. [Single-Shot Object Detection with Enriched Semantics](http://arxiv.org/abs/1712.00433v2) 891. [Generating Synthetic X-ray Images of a Person from the Surface Geometry](http://arxiv.org/abs/1805.00553v2) 892. [Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering](http://arxiv.org/abs/1712.00377v1) 893. [Edit Probability for Scene Text Recognition](http://arxiv.org/abs/1805.03384v1) 894. [MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features](http://arxiv.org/abs/1712.04837v1) 895. [Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment](http://arxiv.org/abs/1803.10699v1) 896. Texture Mapping for 3D Reconstruction with RGB-D Sensor 897. [Multi-Agent Diverse Generative Adversarial Networks](http://arxiv.org/abs/1704.02906v2) 898. [Towards Universal Representation for Unseen Action Recognition](http://arxiv.org/abs/1803.08460v1) 899. [Zero-Shot Kernel Learning.](http://arxiv.org/abs/1802.01279v1) 900. [DOTA: A Large-scale Dataset for Object Detection in Aerial Images](http://arxiv.org/abs/1711.10398v2) 901. [Multi-Frame Quality Enhancement for Compressed Video](http://arxiv.org/abs/1803.04680v4) 902. From Lifestyle VLOGs to Everyday Interactions 903. Occluded Pedestrian Detection through Guided Attention in CNNs 904. [Decoupled Networks](http://arxiv.org/abs/1804.08071v1) 905. Deep Cocktail Networks: Multi-source Unsupervised Domain Adaptation with Category Shift 906. Partially Shared Multi-Task Convolutional Neural Network with Local Constraint for Face Attribute Learning 907. Joint Pose and Expression Modeling for Facial Expression Recognition 908. [Unsupervised Textual Grounding: Linking Words to Image Concepts](http://arxiv.org/abs/1803.11185v1) 909. Interleaved Structured Sparse Convolutional Neural Networks 910. [Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models](http://arxiv.org/abs/1711.06420v1) 911. [ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes](http://arxiv.org/abs/1711.11556v2) 912. [Image to Image Translation for Domain Adaptation](http://arxiv.org/abs/1712.00479v1) 913. A Face to Face Neural Conversation Model 914. [Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification](http://arxiv.org/abs/1711.07027v3) 915. [FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors](http://arxiv.org/abs/1711.10703v1) 916. [SO-Net: Self-Organizing Network for Point Cloud Analysis](http://arxiv.org/abs/1803.04249v4) 917. [MoNet: Moments Embedding Network](http://arxiv.org/abs/1802.07303v2) 918. Coupled End-to-end Transfer Learning with Generalized Fisher Information 919. Inferring Light Fields from Shadows 920. [LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image](http://arxiv.org/abs/1803.08999v1) 921. Multi-Level Fusion based 3D Object Detection from Monocular Images 922. Single-Image Depth Estimation Based on Fourier Domain Analysis 923. Flow Guided Recurrent Neural Encoder for Video Salient Object Detection 924. Super-Resolving Very Low-Resolution Face Images with Supplementary Attributes 925. [Seeing Voices and Hearing Faces: Cross-modal biometric matching](http://arxiv.org/abs/1804.00326v2) 926. [Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images](http://arxiv.org/abs/1712.03904v2) 927. [Fast and Accurate Single Image Super-Resolution via Information Distillation Network](http://arxiv.org/abs/1803.09454v1) 928. Learning and Using the Arrow of Time 929. [Rethinking the Faster R-CNN Architecture for Temporal Action Localization](http://arxiv.org/abs/1804.07667v1) 930. Deeply Learned Filter Response Functions for Hyperspectral Reconstruction 931. Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes 932. Intrinsic Image Transformation via Scale Space Decomposition 933. Deep Ordinal Regression Network for Monocular Depth Estimation 934. [Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation](http://arxiv.org/abs/1802.08948v2) 935. Functional Map of the World 936. [CSGNet: Neural Shape Parser for Constructive Solid Geometry](http://arxiv.org/abs/1712.08290v2) 937. [Instance Embedding Transfer to Unsupervised Video Object Segmentation](http://arxiv.org/abs/1801.00908v2) 938. Statistical Tomography of Microscopic Life 939. Point-wise Convolutional Neural Networks 940. Pixar: Real-time 3D Object Detection from Point Clouds 941. HydraNets: Specialized Dynamic Architectures for Efficient Inference 942. [Deep Depth Completion of a Single RGB-D Image](http://arxiv.org/abs/1803.09326v2) 943. [Learning to Extract a Video Sequence from a Single Motion-Blurred Image](http://arxiv.org/abs/1804.04065v1) 944. A Fast Resection-Intersection Method for the Known Rotation Problem 945. [iVQA: Inverse Visual Question Answering](http://arxiv.org/abs/1710.03370v2) 946. Crowd Counting via Adversarial Cross-Scale Consistency Pursuit 947. Trust your Model: Light Field Depth Estimation with inline Occlusion Handling 948. [PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition](http://arxiv.org/abs/1804.03492v3) 949. [A Memory Network Approach for Story-based Temporal Summarization of 360° Videos](http://arxiv.org/abs/1805.02838v1) 950. [Tags2Parts: Discovering Semantic Regions from Shape Tags](http://arxiv.org/abs/1708.06673v3) 951. Jerk-Aware Video Acceleration Magnification 952. A Robust Method for Strong Rolling Shutter Effects Correction Using Lines with Automatic Feature Selection 953. [Mobile Video Object Detection with Temporally-Aware Feature Maps](http://arxiv.org/abs/1711.06368v2) 954. VirtualHome: Simulating Household Activities via Programs 955. MoNet: Deep Motion Exploitation for Video Object Segmentation 956. Detect globally, refine locally: A novel approach to saliency detection 957. EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry 958. [Learning Face Age Progression: A Pyramid Architecture of GANs](http://arxiv.org/abs/1711.10352v1) 959. Normalized Cut Loss for Weakly Supervised CNN Segmentation 960. Reconstructing Thin Structures of Manifold Surfaces by Integrating Spatial Curves 961. [Dynamic Few-Shot Visual Learning without Forgetting](http://arxiv.org/abs/1804.09458v1) 962. [Camera Style Adaptation for Person Re-identification](http://arxiv.org/abs/1711.10295v2) 963. [In-Place Activated BatchNorm for Memory-Optimized Training of DNNs](http://arxiv.org/abs/1712.02616v2) 964. [NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning](http://arxiv.org/abs/1805.06875v1) 965. Resource Aware Person Re-identification across Multiple Resolutions 966. [Zero-Shot Super-Resolution using Deep Internal Learning](http://arxiv.org/abs/1712.06087v1) 967. [Analysis of Hand Segmentation in the Wild](http://arxiv.org/abs/1803.03317v2) 968. [Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination](http://arxiv.org/abs/1703.09913v2) 969. Face Aging with Identity-Preserved Conditional Generative Adversarial Networks 970. [Deep Extreme Cut: From Extreme Points to Object Segmentation](http://arxiv.org/abs/1711.09081v2) 971. Person Re-identification with Cascaded Pairwise Convolutions 972. Distributable Consistent Multi-Graph Matching 973. [A Twofold Siamese Network for Real-Time Object Tracking](http://arxiv.org/abs/1802.08817v1) 974. [AON: Towards Arbitrarily-Oriented Text Recognition](http://arxiv.org/abs/1711.04226v2) 975. Deep Cauchy Hashing for Hamming Space Retrieval 976. Non-blind Deblurring: Handling Kernel Uncertainty with CNNs 977. Referring Image Segmentation via Recurrent Refinement Networks 978. Deep Density Clustering of Unconstrained Faces 979. A Constrained Deep Neural Network for Ordinal Regression