Fall 2018 ‒ IVRL ‐ EPFL

Reconstruction of 3D objects with radial imaging

Synopsis:
Radial imaging systems capture a scene from a large number of viewpoints within a single image, using a camera and a curved mirror. These systems can recover scene properties such as scene geometry, reflectance, and texture [1].

In this project, you will implement a system capable of 3D object reconstruction. You will start with a 3D texture reconstruction, i.e., reconstruction of flat objects with small height variations (textile, tree bark, painting, etc.). Like in traditional stereo systems, you will match image features along epipolar lines. However, the difference is that the epipolar lines for such a system are radial. Hence, ambiguities occur only for edges oriented along radial lines in the image.

For the reconstruction of macroscopic 3D objects, you will simply be using a conical mirror with an increased field of view. In addition, you will identify and discard correspondences in specular regions and texture-less regions, and interpolate their depth from neighboring pixels with valid matches.

References:
[1] Sujit Kuthirummal and Shree K. Nayar. “Multiview radial catadioptric imaging for scene capture.” ACM Transactions on Graphics (TOG). Vol. 25. No. 3. ACM, 2006.

Deliverables: Report and running prototype.

Prerequisites:
– basic knowledge of computer vision
– coding skills in Matlab and Python or C/C++

Level: BS/MS semester project

Type of work: 50% implementation and 50% research

Supervisor: Marjan Shahpaski ([email protected])

Simultaneous Geometric and Radiometric Calibration of a Projector-Camera Pair

Synopsis:
3D sensing is gaining momentum with the increase of computational power, and with the possibility to display and fabricate the results with high fidelity. Structured light (SL) systems are among the most commonly used for 3D object scanning because they can be built by using off-the-shelf hardware components. However, they do require a geometric and, in certain cases, a radiometric calibration of the projector-camera pair.

We therefore devised a novel method that allows for simultaneous geometric and radiometric calibration of a projector-camera pair. It is simple, efficient and user friendly. We prewarp and align a specially designed projection pattern onto a printed pattern of different colorimetric properties. After capturing the patterns in several orientations, we perform geometric calibration by estimating the corner locations of the two patterns in different color channels. We perform radiometric calibration of the projector by using the information contained inside the projected squares. For more details, please see the paper and presentation found on: http://www.epfl.ch/labs/ivrl/research/grc

In this project, you will implement the presented method [1] in a robust and efficient manner. First, you will test our Matlab implementation and improve it in terms of robustness, speed and ease of use. Then, you will transfer it into a Python or C/C++ executable that will be easily distributable to other users. Finally, you will experiment with additional features like on the fly calibration as the images are captured, user guidance (feedback), etc.

References:
[1] Shahpaski, Marjan, et al. “Simultaneous Geometric and Radiometric Calibration of a Projector-Camera Pair.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017).

Deliverables: Report and running prototype.

Prerequisites:
– basic knowledge of computer vision
– coding skills in Matlab and Python or C/C++

Level: BS/MS semester project

Type of work: 90% implementation and 10% research

Supervisor: Marjan Shahpaski ([email protected])

Visualization of generative models

Synopsis:
Deep generative adversarial models are very attractive models for generating samples of a distribution. The quality of the results is mostly visualized by few (10-100) generated samples, which is not so much informative in terms of the global distribution artefacts such as mode-collapse.

The goal of this project is to propose and experiment with a better visual representation of the generative models and their distributions for comparison such as t-SNE dimensionality reduction technique.

In this project, you will:
Implement a framework for visualizing a GAN model/distribution. Compare various GAN optimization techniques using your implementation.

References:
https://github.com/LynnHo/DCGAN-LSGAN-WGAN-WGAN-GP-Tensorflow
https://lvdmaaten.github.io/tsne/
Pezzoti et al. “Approximated and User Steerable tSNE for Progressive Visual Analytics”, IEEE TOVG17

Deliverables: A generic framework for t-SNE visualization of the GANs

Prerequisites: Knowledge and experience in deep learning.

Level: Semester project for Master’s students.

Type of work: 60% implementation and 40% research.

Number of Students: 1

Supervised by: Siavash Bigdeli([email protected])

Semi/un-supervised learning for material estimation using IR/NIR data

Synopsis:
End-to-end learning is a very intuitive approach in cases where we do have a large dataset with annotations.

Although this is available for some conservational tasks, such as object detection, other problems such as object material estimation suffer from lack of annotations.

In this project, you will:
Capture and produce a paired RGB and IR/NIR dataset and to learn a meaningful representation of the object material in an unsupervised fashion.

References:
https://www.epfl.ch/labs/ivrl/research/infrared/dataset

Deliverables: A CNN representing object surface materials using RGB+IR/NIR data (potentially we would need a new dataset of pared RGB+IR/NIR for this)

Prerequisites: Knowledge and experience in deep learning.

Level: Semester project for Master’s students.

Type of work: 60% implementation and 40% research.

Number of Students: 1

Supervised by: Siavash Bigdeli([email protected])

Deep Differentiable Renderers

Synopsis:
Differentiable renderers (DR) can be used to backpropagate image-space error(gradients) to the 3D representation (mesh or point cloud). In practice, these renderers are applied for 3D reconstruction and novel view synthesis.

In this project, we will:
Port existing renderers to common deep learning environments such as Tensorflow to be used as a part of a neural net.

References:
H. Kato et al. “Neural 3D mesh renderer “ (https://arxiv.org/pdf/1711.07566.pdf) M. M. Loper and M. J. Black, “OpenDR: An approximate differentiable renderer”. ECCV 2014

Deliverables: A simple and clean implementation of a DR in Tensorflow, with visual demonstration in a 3D reconstruction or synthesis application.

Prerequisites: Knowledge and experience in deep learning and/or in GPU programming

Level: Semester project for Master’s students.

Type of work: 80% implementation and 20% research.

Number of Students: 1

Supervised by: Siavash Bigdeli([email protected])

Blind image restoration

Synopsis:
Most image restoration techniques work on a major assumption that the image degradation models are known a-priori. This is not the case almost all the times and therefore many novel and advanced restoration methods fail to work for real life scenarios.

In this project, you will:
Make benefit from the deep learning tools available and develop and evaluate robust restoration methods for the task of noise- and/or kernel-blind image restoration.

References:
M. Jin, S. Roth, and P. Favaro, “Noise-blind image deblurring”, in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017
Plötz, Tobias, and Stefan Roth. “Benchmarking denoising algorithms with real photographs.” arXiv preprint arXiv:1707.01313 (2017).

Deliverables: A restoration method for blind image deblurring/denoising and evaluation on “real” image datasets.

Prerequisites: Knowledge and experience in deep learning or image restoration.

Level: Semester project for Master’s students.

Type of work: 60% implementation and 40% research.

Number of Students: 2

Supervised by: Siavash Bigdeli([email protected])

Unsupervised semantic co-segmentation

Description: In this project you will develop an algorithm to refine coarse localization masks into fine-grained segmentation masks. The initial masks localize regions of semantic significance and are obtained in an unsupervised fashion from a pre-trained CNN, e.g. VGG-19. Using the CNN and filtering techniques, you will implement and evaluate the mask refinement procedure.

Tasks:

Understand the literature and state of art
Experiment with various algorithmic approaches
Tune parameters to obtain good performance
Quantitatively evaluate the your solution on standard benchmarks

References:
[1] Selvaraju, Ramprasaath R., et al. “Grad-cam: Visual explanations from deep networks via gradient-based localization”, 2016.
[2] http://www.inf.ufrgs.br/~eslgastal/DomainTransform/
[3] http://kaiminghe.com/eccv10/index.html

Deliverables: Code and written report

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Pytorch (preferred) or TensorFlow

Type of work: 40% research, 60% development and testing

Level: Master

Supervisor: Edo Collins ([email protected])

Model me

Description: How accurately can we obtain a 3D model of a new object using a set of photos of it and no priors? What about a human body, can it be good enough to be deceptive? In this project, you will find out.

References: Video Based Reconstruction of 3D People Models (https://arxiv.org/abs/1803.04758v2)

Your competition: https://www.3dflow.net/3df-zephyr-pro-3d-models-from-photos/

http://ccwu.me/vsfm/

(and all their publications can be found on https://www.3dflow.net/technology/)

Deliverables: An algorithm for building 3D models from a set of object photos, with its accuracy evaluation.

Prerequisites: Motivation and autonomy, graphics or image processing background (programming and math always welcome).

Type of work: 50% research, 50% development and testing

Level: BS or MS

Number of students: 1-2

Supervisor(s): Majed El Helou, Ruofan Zhou

Now you see me

Description: The goal is to estimate a person’s pose in a 2D photo. You will simulate synthetic 3D scenes with a virtual person that can move around the scene. For different poses of this person, you will create a set of 2D “photos” of them (simple projections from your 3D scene). This can be used as a dataset for the pose estimation from 2D.

References: Chen, Ching-Hang, and Deva Ramanan. “3d human pose estimation= 2d pose estimation+ matching.” CVPR. Vol. 2. No. 5. 2017.Belagiannis, Vasileios, and Andrew Zisserman. “Recurrent human pose estimation.” IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 2017.

Deliverables: A dataset of simulated 2D captures with their ground truth poses. A pose estimation algorithm with its performance evaluation.

Prerequisites: Motivation and autonomy, graphics or image processing background (programming and math always welcome).

Type of work: 50% research, 50% development and testing

Level: BS or MS

Number of students: 1

Supervisor(s): Majed El Helou, Fayez Lahoud, Ruofan Zhou

Shadow removal

Description: In this project you will create a small dataset of color and NIR (near infrared) images of scenes with and without shadow. Then you will develop your own shadow removal algorithm, improving on the state of the art. One suggested algorithm is to compute (or learn) a tensor of the size of the image that serves as the transformation from shadow to normal lighting, which will be created as the average transformation between every shadow pixel and X of the most similar lit pixels to it. However, you can develop your own approach.

References: R. Guo, Q. Dai, and D. Hoiem, “Single-Image shadow detection and removal using paired regions.” IEEE Computer Vision and Pattern Recognition CVPR, pages 2033-2040, 2011.
N. Salamati, A. Germain and S. Süsstrunk, Removing Shadows from Images Using Color and Near-infrared, Proc. IEEE International Conference on Image Processing (ICIP), 2011.

Deliverables: The dataset. A shadow removal algorithm with realistic results and its quantitative evaluation on your dataset.

Prerequisites: Motivation and autonomy, image processing or computational photography background (programming and math always welcome).

Type of work: 50% research, 50% development and testing

Level: BS or MS

Number of students: 1-2

Supervisor(s): Majed El Helou, Ruofan Zhou

Violence Detection in Movies

Description: The goal of this project is to label violent scenes in a given movie by using convolutional neural networks. By defining the violence, the task of violent scene detection is sub-categorized into detection of certain objects e.g. gun and actions e.g. punching. First input video is segmented into the scenes, then salient frames and actions are detected in the scene. Dataset will be collected from several image databases and search engines.

Tasks:

Understand the literature and state of art
Define the violence labels and create training set.
Build CNN model and train the network.
Validate the model and improve the accuracy.

References:
Demarty, C.H., Ionescu, B., Jiang, Y.G., Quang, V.L., Schedl, M., Penet, C.: Benchmarking violent scenes detection in movies. In: 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2014)

Mu G., Cao H., Jin Q. (2016) Violent Scene Detection Using Convolutional Neural Networks and Deep Audio Features

Dai, Q., Zhao, R.W., Wu, Z., Wang, X., Gu, Z., Wu, W., Jiang, Y.G.: Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning

Deliverables: At the end of the semester, the student should provide a framework that automatically detects violent scenes.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work: 40% research, 60% development and testing

Level: Master

Supervisor: Sami Arpa ([email protected])

Genre Detection in Movies

Description: The goal of this project is to infer genre of each scene in a given movie by using convolutional neural networks. By defining the genre, the task of genre detection is sub-categorized into detection of certain objects e.g. spaceship for science-fiction and actions e.g. laughing for comedy. First input video is segmented into the scenes, then salient frames and actions are detected in the scene. Dataset will be collected from several image databases and search engines.

Tasks:

Understand the literature and state of art
Define the violence labels and create training set.
Build CNN model and train the network.
Validate the model and improve the accuracy.

References:
Sivaraman, K. S., and Gautam Somappa. “MovieScope: Movie trailer classification using Deep Neural Networks.” University of Virginia (2016).

Simões, Gabriel S., et al. “Movie genre classification with convolutional neural networks.” Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016.

Deliverables: At the end of the semester, the student should provide a framework that automatically detects genre of each scene of a given movie.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work: 40% research, 60% development and testing

Level: Master

Supervisor: Sami Arpa ([email protected])

Faces of persons for counterfeit prevention

Synopsis:
EPFL and startup company Innoview Sàrl have developed counterfeit prevention features. The current project aims at improving the faces of persons that are displayed in original documents such as ID cards or passports. For more information, contact the supervisors.

Deliverables: Report and running prototype

Prerequisits:
– coding skills in Matlab or Java.
– basic knowledge in 3D graphics

Level: BS, MS semester or Master project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44
Prof. hon Prof. hon. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Recovery of a fluorescent hidden watermark on an Android smartphone

Synopsis:
A watermark printed with daylight fluorescent inks hidden into a color background can be recovered under blue light. The goal is to adapt an already developed Android software package acquiring images embedding hidden watermarks. This requires on the fly image acquisition, real-time sharpness evaluation, focus, and appropriate thresholding to recover the watermark.

Reference:
R. Rossier, R.D. Hersch, Hiding patterns with daylight fluorescent inks. Proc. IS&T/SID’s 19th Color Imaging Conference, San Jose, CA, USA, November 7-11, 2011, pp. 223-228, see http://lsp.epfl.ch/colorpublications

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisits:
– basic knowledge of image processing / computer vision
– coding skills in Matlab (and possibly Java Android)

Level: BS, MS semester or master project

Supervisor:
Dr Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44

Prof. hon Prof. hon. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Recovery of a tiled watermark on an Android smartphone

Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale or color images. The current project aims at carrying out the same recovery, but for a watermark that is tiled.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab and/or Java Android

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Interactive recovery of watermarks on an Android smartphone

Startup company Innoview Sàrl has developed software to recover by smartphone a watermark by superposing a revealer on top of a base image that is obtained by camera acquisition. The project aims at making this software interactive in order to show the revealed watermark to the user.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing
– basic coding skills in Matlab and Java Android

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Natural Face Editing with Deep Learning

Natural face editing is a fun application (consider FaceAPP[1] for example): you can add/remove glasses, add/remove makeup, add/remove a smile, etc. Traditional face editing methods often require a number of sophisticated and task specific algorithms to be applied one after the other — a process that is tedious, fragile, and computationally intensive. As recent network architectures (for example, CycleGAN[3]) is able to perform image-to-image translation. In this project, we want to build a framework for natural face editing based on these network architectures, and potentially improve their visual results.

In this project you will:
– Explore how to collect/create large dataset for deep learning tasks
– Research and experiment on different network architectures
– Research on evaluation of visual results

References:
[1] Faceapp
[2] Zhixin Shu , Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shechtman, Dimitris Samaras: Neural Face Editing with Intrinsic Image Disentangling
[3] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Deliverables: code and report

Prerequisites: experience in some deep learning framework, image processing

Level: master

Supervisor: Ruofan Zhou ([email protected])

Self-identifying corners for automatic joint calibration of a projector-camera pair

Geometric camera calibration methods rely on capturing images of a flat checkerboard pattern in different orientations, and identifying the corners in the captured images [1]. Furthermore, for methods where the corner detection is performed automatically, we are expected to see all corners in order to determine the orientation of the pattern. Joint projector-camera calibration is performed in a similar manner, by capturing a printed and a projected pattern in the same image [2, 3].

In this project, we want to investigate pattern designs that allow us to uniquely identify all corners in the pattern by inspecting their surrounding squares. To do this we will look into pattern squares that carry identification information, similar to QR-codes [4]. We will also investigate whether we can leverage the different color channels of the projector and the camera to propose new designs. The project will include hands-on work with a projector-camera pair.

References:
[1] Zhang, Zhengyou. “A flexible new technique for camera calibration.” IEEE Transactions on pattern analysis and machine intelligence, 2000.
[2] Ashdown, Mark, and Yoichi Sato. “Steerable projector calibration.” IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2005.
[3] Audet, Samuel, and Masatoshi Okutomi. “A user-friendly method to geometrically calibrate projector-camera systems.” IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2009.
[4] Fiala, Mark, and Chang Shu. “Fully automatic camera calibration using self-identifying calibration targets.” Techn. Rep, 2005.

Deliverables: Report and running prototype.

Prerequisites:
– knowledge of image processing / computer vision / OpenCV
– coding skills in Matlab and C/C++

Level: MS semester project

Type of work: 40% research and 60% implementation

Supervisor: Marjan Shahpaski ([email protected])