去年10月份参加的ICCV,这次主要关注的几个方向:
– Efficient DNN Inference
– 3D Vision, especially 6DOF pose estimation
– Fundamental improvements in DNN representation learning
– Applications on Human/Face
Efficient DNN Inference
Low-Power Computer Vision Workshop: 这个Workshop组织了一个竞赛叫Low-Power Image Recognition Challenge (LPIRC)。定义指标里不仅仅考量模型的精度,也同时考量模型运行时的能耗。这个Workshop里有部分报告来自竞赛的优胜队伍,分享一些技术上的技巧,也有一些来自业界和学界的报告。
Prof. Soonhoi Ha 讲了 Software-Hardware Co-Design,期间也分享了一些提高比赛分数的技巧。
另一个印象比较有印象的 来自 Qualcomm 的 Edwin Park 的报告。他们做在芯片中的Vision算法对能耗特别敏感,而且是一个always-on的应用场景。
我感觉考虑能耗还是需要和硬件结合起来做,软件层面有些改进是直接提高分数的,比如更好的loss。但是涉及到网络结构和inference方式的改进基本上都需要权衡速度和精度。虽然目前这个Workshop中最好的方法从数值上看结果还不够好,但这个方向还是很有意义的。只是考虑到目前硬件以及深度学习编译器的发展,可能评测的方式得有所改进。
主会逛poster也看到了一些相关的文章:
- Proximal Mean-Field for Neural Network Quantization; Thalaiyasingam Ajanthan, Puneet K. Dokania, Richard Hartley, Philip H. S. Torr
- Improved Techniques for Training Adaptive Deep Networks; Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, Gao Huang,
- Adaptative Inference Cost With Convolutional Neural Mixture Models; Adria Ruiz, Jakob Verbeek (每一层训练有多个不同的版本,运行的时候通过上一层的输出来决定下一层用哪个)
- Data-Free Quantization Through Weight Equalization and Bias Correction; Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling (直接量化模型,不用重新训练也不用数据做校准,对网络本身有些假设,整体上合理)
- Searching for MobileNetV3 ; Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam (Relu6)
- Multinomial Distribution Learning for Effective Neural Architecture Search; Xiawu Zheng, Rongrong Ji, Lang Tang, Baochang Zhang, Jianzhuang Liu, Qi Tian
- Bit-Flip Attack: Crushing Neural Network With Progressive Bit Search; Adnan Siraj Rakin, Zhezhi He, Deliang Fan (用来attack一个网络,只需要翻动很少的bit就可以成功)
3D Vision, especially 6DOF pose estimation
这个方向上主要是听了一个Workshop on Recovering 6D Object Pose。比较完整的听完了Eric Brachmann的Talk。他的思路是大体按照传统方法分步做物体的姿态估计,但是利用deep learning把中间的一些部分弄成可微的,然后用训练得到的组件替换之前的传统方法。很合理的思路,看起来效果也不错。他的slides也是公开的。
后来 Matthias Nießner 也讲了他最近的SCAN2CAD,其实是他们CVPR 2019的工作。这个方向我觉得很好,不是单纯的分割。分割出来其实还是要做语义上的分析,而且实际应用中有CAD模型的场景并不少见,是个很好方向。
主会逛poster看到的一些相关的文章:
- C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion; David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldim
- Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation; Kiru Park, Timothy Patten, Markus Vincze (提出了一种Normalized Coordinates的表示,用来统一表示Canonical Pose的物体)
- RIO: 3D Object Instance Re-Localization in Changing Indoor Environments; Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, Matthias Niessner (已经有物体的点云,在场景点云中重定位)
- GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild; Alexander Grabner, Peter M. Roth, Vincent Lepetit
- Neural Turtle Graphics for Modeling City Road Layouts; Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler
- Closed-Form Optimal Two-View Triangulation Based on Angular Errors; Seong Hun Lee, Javier Civera
Xu Chen, Jie Song, Otmar Hilliges, Monocular Neural Image Based Rendering With Continuous View Control 这篇的生成新视角的效果很好,现场看到的图分辨率很高也很真实)
Fundamental improvements in DNN representation learning 相关的一些文章:
- Anchor Loss: Modulating Loss Scale Based on Prediction Difficulty; Serim Ryou, Seong-Gyun Jeong, Pietro Perona
- Transformable Bottleneck Networks; Kyle Olszewski, Sergey Tulyakov, Oliver Woodford, Hao Li, Linjie Luo
- EvalNorm: Estimating Batch Normalization Statistics for Evaluation; Saurabh Singh, Abhinav Shrivastava
- CARAFE: Content-Aware ReAssembly of FEatures; Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin (Learning-based Up-Sampling Kernel)
- Linearized Multi-Sampling for Differentiable Image Transformation; Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, Kwang Moo Yi
- Differentiable Kernel Evolution; Yu Liu, Jihao Liu, Ailing Zeng, Xiaogang Wang
- ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks; Xiaohan Ding, Yuchen Guo, Guiguang Ding, Jungong Han
- A Comprehensive Overhaul of Feature Distillation; Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi (通过一组transformer来对齐特征,有点像CCA)
Applications on Human/Face 相关的一些文章:
- Learnable Triangulation of Human Pose; Karim Iskakov, Egor Burkov, Victor Lempitsky, Yury Malkov ( 多相机重建3d skeleton 效果好 )
- Probabilistic Face Embeddings; Yichun Shi, Anil K. Jain
- Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network; Lingxue Song, Dihong Gong, Zhifeng Li, Changsong Liu, Wei Liu
- Tracking Without Bells and Whistles; Philipp Bergmann, Tim Meinhardt, Laura Leal-Taixe (没有显式的检测+跨帧关联检测框,而是简单的检测+检测框回归预测新的检测框,效果好应该是得益于检测模型最近的发展)
- Video Face Clustering with Unknown Number of Clusters; Makarand Tapaswi, Marc T. Law, Sanja Fidler
- Gaze360: Physically Unconstrained Gaze Estimation in the Wild; Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, Antonio Torralba (Antonio有好几个Gaze相关的数据集了)
第一次参加ICCV。和CVPR相比,会议安排日程相对宽松,还预留了半天休息的时间。Poster区域被围绕在参展的厂商的展台中间。可能是参会的人比较多,感觉Poster区域还是太小了,通道里人挤人。在ICCV参展的厂商明显比CVPR少,而且没有自动驾驶的小车也没有大卡车。