Available Projects – Fall 2018

Reconstruction of 3D objects with radial imaging

Synopsis:
Radial imaging systems capture a scene from a large number of viewpoints within a single image, using a camera and a curved mirror. These systems can recover scene properties such as scene geometry, reflectance, and texture [1].

In this project, you will implement a system capable of 3D object reconstruction. You will start with a 3D texture reconstruction, i.e., reconstruction of flat objects with small height variations (textile, tree bark, painting, etc.). Like in traditional stereo systems, you will match image features along epipolar lines. However, the difference is that the epipolar lines for such a system are radial. Hence, ambiguities occur only for edges oriented along radial lines in the image.

For the reconstruction of macroscopic 3D objects, you will simply be using a conical mirror with an increased field of view. In addition, you will identify and discard correspondences in specular regions and texture-less regions, and interpolate their depth from neighboring pixels with valid matches.

References:
[1] Sujit Kuthirummal and Shree K. Nayar. “Multiview radial catadioptric imaging for scene capture.” ACM Transactions on Graphics (TOG). Vol. 25. No. 3. ACM, 2006.

Deliverables: Report and running prototype.

Prerequisites:
– basic knowledge of computer vision
– coding skills in Matlab and Python or C/C++

Level: BS/MS semester project

Type of work: 50% implementation and 50% research

Supervisor: Marjan Shahpaski (firstname.lastname@epfl.ch)


Simultaneous Geometric and Radiometric Calibration of a Projector-Camera Pair

Synopsis:
3D sensing is gaining momentum with the increase of computational power, and with the possibility to display and fabricate the results with high fidelity. Structured light (SL) systems are among the most commonly used for 3D object scanning because they can be built by using off-the-shelf hardware components. However, they do require a geometric and, in certain cases, a radiometric calibration of the projector-camera pair.

We therefore devised a novel method that allows for simultaneous geometric and radiometric calibration of a projector-camera pair. It is simple, efficient and user friendly. We prewarp and align a specially designed projection pattern onto a printed pattern of different colorimetric properties. After capturing the patterns in several orientations, we perform geometric calibration by estimating the corner locations of the two patterns in different color channels. We perform radiometric calibration of the projector by using the information contained inside the projected squares. For more details, please see the paper and presentation found on: http://ivrl.epfl.ch/research/grc

In this project, you will implement the presented method [1] in a robust and efficient manner. First, you will test our Matlab implementation and improve it in terms of robustness, speed and ease of use. Then, you will transfer it into a Python or C/C++ executable that will be easily distributable to other users. Finally, you will experiment with additional features like on the fly calibration as the images are captured, user guidance (feedback), etc.

References:
[1] Shahpaski, Marjan, et al. “Simultaneous Geometric and Radiometric Calibration of a Projector-Camera Pair.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017).

Deliverables: Report and running prototype.

Prerequisites:
– basic knowledge of computer vision
– coding skills in Matlab and Python or C/C++

Level: BS/MS semester project

Type of work: 90% implementation and 10% research

Supervisor: Marjan Shahpaski (firstname.lastname@epfl.ch)


Visualization of generative models

Synopsis:
Deep generative adversarial models are very attractive models for generating samples of a distribution. The quality of the results is mostly visualized by few (10-100) generated samples, which is not so much informative in terms of the global distribution artefacts such as mode-collapse.

The goal of this project is to propose and experiment with a better visual representation of the generative models and their distributions for comparison such as t-SNE dimensionality reduction technique.

In this project, you will:
Implement a framework for visualizing a GAN model/distribution.
Compare various GAN optimization techniques using your implementation.

References:
https://github.com/LynnHo/DCGAN-LSGAN-WGAN-WGAN-GP-Tensorflow
https://lvdmaaten.github.io/tsne/

Pezzoti et al. “Approximated and User Steerable tSNE for Progressive Visual Analytics”, IEEE TOVG17

Deliverables: A generic framework for t-SNE visualization of the GANs

Prerequisites: Knowledge and experience in deep learning.

Level: Semester project for Master’s students.

Type of work: 60% implementation and 40% research.

Number of Students: 1

Supervised by: Siavash Bigdeli(siavash.bigdeli@epfl.ch)


Semi/un-supervised learning for material estimation using IR/NIR data

Synopsis:
End-to-end learning is a very intuitive approach in cases where we do have a large dataset with annotations.

Although this is available for some conservational tasks, such as object detection, other problems such as object material estimation suffer from lack of annotations.

In this project, you will:
Capture and produce a paired RGB and IR/NIR dataset and to learn a meaningful representation of the object material in an unsupervised fashion.

References:
https://ivrl.epfl.ch/research/infrared/dataset

Deliverables: A CNN representing object surface materials using RGB+IR/NIR data (potentially we would need a new dataset of pared RGB+IR/NIR for this)

Prerequisites: Knowledge and experience in deep learning.

Level: Semester project for Master’s students.

Type of work: 60% implementation and 40% research.

Number of Students: 1

Supervised by: Siavash Bigdeli(siavash.bigdeli@epfl.ch)


Deep Differentiable Renderers

Synopsis:
Differentiable renderers (DR) can be used to backpropagate image-space error(gradients) to the 3D representation (mesh or point cloud). In practice, these renderers are applied for 3D reconstruction and novel view synthesis.

In this project, we will:
Port existing renderers to common deep learning environments such as Tensorflow to be used as a part of a neural net.

References:
H. Kato et al. “Neural 3D mesh renderer “ (https://arxiv.org/pdf/1711.07566.pdf)M. M. Loper and M. J. Black, “OpenDR: An approximate differentiable renderer”. ECCV 2014

Deliverables: A simple and clean implementation of a DR in Tensorflow, with visual demonstration in a 3D reconstruction or synthesis application.

Prerequisites: Knowledge and experience in deep learning and/or in GPU programming

Level: Semester project for Master’s students.

Type of work: 80% implementation and 20% research.

Number of Students: 1

Supervised by: Siavash Bigdeli(siavash.bigdeli@epfl.ch)


Blind image restoration

Synopsis:
Most image restoration techniques work on a major assumption that the image degradation models are known a-priori. This is not the case almost all the times and therefore many novel and advanced restoration methods fail to work for real life scenarios.

In this project, you will:
Make benefit from the deep learning tools available and develop and evaluate robust restoration methods for the task of noise- and/or kernel-blind image restoration.

References:
M. Jin, S. Roth, and P. Favaro, “Noise-blind image deblurring”, in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017
Plötz, Tobias, and Stefan Roth. “Benchmarking denoising algorithms with real photographs.” arXiv preprint arXiv:1707.01313 (2017).

Deliverables: A restoration method for blind image deblurring/denoising and evaluation on “real” image datasets.

Prerequisites: Knowledge and experience in deep learning or image restoration.

Level: Semester project for Master’s students.

Type of work: 60% implementation and 40% research.

Number of Students: 2

Supervised by: Siavash Bigdeli(siavash.bigdeli@epfl.ch)


Unsupervised semantic co-segmentation

Description: In this project you will develop an algorithm to refine coarse localization masks into fine-grained segmentation masks. The initial masks localize regions of semantic significance and are obtained in an unsupervised fashion from a pre-trained CNN, e.g. VGG-19. Using the CNN and filtering techniques, you will implement and evaluate the mask refinement procedure.

Tasks:

  1. Understand the literature and state of art
  2. Experiment with various algorithmic approaches
  3. Tune parameters to obtain good performance
  4. Quantitatively evaluate the your solution on standard benchmarks

 

References:
[1] Selvaraju, Ramprasaath R., et al. “Grad-cam: Visual explanations from deep networks via gradient-based localization”, 2016.
[2] http://www.inf.ufrgs.br/~eslgastal/DomainTransform/
[3] http://kaiminghe.com/eccv10/index.html

Deliverables: Code and written report

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Pytorch (preferred) or TensorFlow

Type of work: 40% research, 60% development and testing

Level: Master

Supervisor: Edo Collins (firstname.lastname@epfl.ch)


Model me

Description: How accurately can we obtain a 3D model of a new object using a set of photos of it and no priors? What about a human body, can it be good enough to be deceptive? In this project, you will find out.

References: Video Based Reconstruction of 3D People Models (https://arxiv.org/abs/1803.04758v2)

Your competition: https://www.3dflow.net/3df-zephyr-pro-3d-models-from-photos/

http://ccwu.me/vsfm/

(and all their publications can be found on https://www.3dflow.net/technology/)

Deliverables: An algorithm for building 3D models from a set of object photos, with its accuracy evaluation.

Prerequisites: Motivation and autonomy, graphics or image processing background (programming and math always welcome).

Type of work: 50% research, 50% development and testing

Level: BS or MS

Number of students: 1-2

Supervisor(s): Majed El Helou, Ruofan Zhou


Now you see me

Description: The goal is to estimate a person’s pose in a 2D photo. You will simulate synthetic 3D scenes with a virtual person that can move around the scene. For different poses of this person, you will create a set of 2D “photos” of them (simple projections from your 3D scene). This can be used as a dataset for the pose estimation from 2D.

References: Chen, Ching-Hang, and Deva Ramanan. “3d human pose estimation= 2d pose estimation+ matching.” CVPR. Vol. 2. No. 5. 2017.Belagiannis, Vasileios, and Andrew Zisserman. “Recurrent human pose estimation.” IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 2017.

Deliverables: A dataset of simulated 2D captures with their ground truth poses. A pose estimation algorithm with its performance evaluation.

Prerequisites: Motivation and autonomy, graphics or image processing background (programming and math always welcome).

Type of work: 50% research, 50% development and testing

Level: BS or MS

Number of students: 1

Supervisor(s): Majed El Helou, Fayez Lahoud, Ruofan Zhou


Shadow removal

Description: In this project you will create a small dataset of color and NIR (near infrared) images of scenes with and without shadow. Then you will develop your own shadow removal algorithm, improving on the state of the art. One suggested algorithm is to compute (or learn) a tensor of the size of the image that serves as the transformation from shadow to normal lighting, which will be created as the average transformation between every shadow pixel and X of the most similar lit pixels to it. However, you can develop your own approach.

References: R. Guo, Q. Dai, and D. Hoiem, “Single-Image shadow detection and removal using paired regions.” IEEE Computer Vision and Pattern Recognition CVPR, pages 2033-2040, 2011.
N. Salamati, A. Germain and S. Süsstrunk, Removing Shadows from Images Using Color and Near-infrared, Proc. IEEE International Conference on Image Processing (ICIP), 2011.

Deliverables: The dataset. A shadow removal algorithm with realistic results and its quantitative evaluation on your dataset.

Prerequisites: Motivation and autonomy, image processing or computational photography background (programming and math always welcome).

Type of work: 50% research, 50% development and testing

Level: BS or MS

Number of students: 1-2

Supervisor(s): Majed El Helou, Ruofan Zhou


Violence Detection in Movies

Description: The goal of this project is to label violent scenes in a given movie by using convolutional neural networks. By defining the violence, the task of violent scene detection is sub-categorized into detection of certain objects e.g. gun and actions e.g. punching. First input video is segmented into the scenes, then salient frames and actions are detected in the scene. Dataset will be collected from several image databases and search engines.

Tasks:

  1. Understand the literature and state of art
  2. Define the violence labels and create training set.
  3. Build CNN model and train the network.
  4. Validate the model and improve the accuracy.

 

References:
Demarty, C.H., Ionescu, B., Jiang, Y.G., Quang, V.L., Schedl, M., Penet, C.: Benchmarking violent scenes detection in movies. In: 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2014)

Mu G., Cao H., Jin Q. (2016) Violent Scene Detection Using Convolutional Neural Networks and Deep Audio Features

Dai, Q., Zhao, R.W., Wu, Z., Wang, X., Gu, Z., Wu, W., Jiang, Y.G.: Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning

Deliverables: At the end of the semester, the student should provide a framework that automatically detects violent scenes.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work: 40% research, 60% development and testing

Level: Master

Supervisor: Sami Arpa (sami.arpa@epfl.ch)


Genre Detection in Movies

Description: The goal of this project is to infer genre of each scene in a given movie by using convolutional neural networks. By defining the genre, the task of genre detection is sub-categorized into detection of certain objects e.g. spaceship for science-fiction and actions e.g. laughing for comedy. First input video is segmented into the scenes, then salient frames and actions are detected in the scene. Dataset will be collected from several image databases and search engines.

Tasks:

  1. Understand the literature and state of art
  2. Define the violence labels and create training set.
  3. Build CNN model and train the network.
  4. Validate the model and improve the accuracy.

References:
Sivaraman, K. S., and Gautam Somappa. “MovieScope: Movie trailer classification using Deep Neural Networks.” University of Virginia (2016).

Simões, Gabriel S., et al. “Movie genre classification with convolutional neural networks.” Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016.

Deliverables: At the end of the semester, the student should provide a framework that automatically detects genre of each scene of a given movie.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work: 40% research, 60% development and testing

Level: Master

Supervisor: Sami Arpa (sami.arpa@epfl.ch)


Knowledge transfer between RGB only and RGB + other modalities

Synopsis:
Deep neural networks are mostly trained on large RGB datasets. This limits their application for different modalities (depth, NIR, thermal, etc.) where other types of information are available. We want to make use of additional data such as depth and NIR to improve the accuracy of conventional deep learning approaches.

The problem with multimodal data is the lack of large datasets as in RGB images alone. So, the proposed technique should not suffer from overfitting and should benefit from the large RGB only datasets.

In this project you will:
Train networks on RGB and RGB+ data in supervised or unsupervised fashion (baselines).

Improve the RGB only network using the additional modalities available such as NIR, depth and thermal

channels (datasets for NIR and thermal are available).

References:
[1] Cross Modal Distillation for Supervision Transfer https://arxiv.org/pdf/1507.00448.pdf

Deliverables:
A neural net that receives RGB+ input to improve the accuracy of a conventional net trained on RGB alone

Prerequisites:
Knowledge and experience in deep learning
RGB+NIR and RGB+Thermal datasets are available

Level: MS semester or thesis project

Type of work: 60% implementation and 40% research.

Supervisor: Fayez Lahoud, Siavash Bigdeli


Visual feedback for tone training

Synopsis:
The sound of language can be divided into consonants, vowels and tones. As many as 70% of the world’s languages use tones to convey word meaning. – Moira Yip, Tone.
However, for Latin and most Germanic based languages, tone is only used to convey emotion combined with stress and rhythm changes. This creates difficulties in learning tonal languages when it comes to speaking and listening. In this project, we aim to create a tool (application) that helps people train their tones by providing a clear and constructive feedback based on understanding where the user’s difficulties lie. We will first focus on Mandarin Chinese as it has a simple tonal system comprised of only 4 different contour tones.
We already have a setup ready to draw the pitch from a sound recording, we propose to build a tool that:

  • Takes a user recording and a correct recording and compares them, providing a good feedback for the user to improve
  • Study the user’s history of successes/mistakes to adjust the learning schedule
  • Potentially help tackle minimal pairs training for listening and speaking (e.g. shuang vs shuan/zuo vs cuo vs suo/…)

 

In this project you will:
Design and implement a distance measure to compute the difference between a user’s tones and the correct tonal pronunciation. Study the patterns of mistakes done by users to identify their weakness and propose an adjustment in their learning schedule.

References:
[1] Tone visualization: http://www.sinosplice.com/life/archives/2008/01/21/seeing-the-tones-of-mandarin-chinese-with-praat

Deliverables:
Functional tool (Optional: Mobile application)

Prerequisites:
Comfortable with machine learning, speech processing
Knowledge of a tonal language could help you on the way

Level: Bachelor or Master

Type of work: 60% implementation, 40% research

Supervisor: Fayez Lahoud, Ruofan Zhou


Tracking and mapping in infrared video

Synopsis:
The goal of this project is to be able to localize and map an area using infrared (thermal) video sequences. In low visibility conditions, such as darkness or smoke, visual methods are not helpful as most of the visible photography becomes black. One can use frames from a thermal camera to create model to help navigate people/robots in such areas.
There has been a lot of previous work on visual mapping, this project aims to study the literature and reproduce the state of art, then apply this knowledge to the problem of thermal mapping.

In this project you will:

  • Understand previous literature and state of art
  • Implement a few existing methods
  • Transfer that knowledge to a thermal model

References:
[1] Parallel Tracking and Mapping for Small AR Workspaces. G. Klein, D. Murray. ISMAR 2007
[2] Real-Time 6-DOF Monocular Visual SLAM in a Large-scale Environments. H. Lim, J. Lim, H. Jin Kim. ICRA 2014.
[3] RGB-T SLAM: A flexible SLAM framework by combining appearance and thermal information. L. Chen, L. Sun, T. Yang, L. Fan, K. Huang and Z. Xuanyuan. ICRA 2017.

Deliverables: A set of implemented visual SLAM methods (at most 3) + a thermal SLAM

Prerequisites: Experience in tracking and localization, potentially usage of deep learning

Level: Master

Type of work: 70% implementation, 30% research

Supervisor: Fayez Lahoud


Faces of persons for counterfeit prevention (TAKEN)

Synopsis:
EPFL and startup company Innoview Sàrl have developed counterfeit prevention features. The current project aims at improving the faces of persons that are displayed in original documents such as ID cards or passports. For more information, contact the supervisors.

Deliverables: Report and running prototype

Prerequisits:
– coding skills in Matlab or Java.
– basic knowledge in 3D graphics

Level: BS, MS semester or Master project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, tel 078 664 36 44
Prof. hon Prof. hon. Roger D. Hersch, INM034, rd.hersch@epfl.ch, cell: 077 406 27 09


Recovery of a fluorescent hidden watermark on an Android smartphone (TAKEN)

Synopsis:
A watermark printed with daylight fluorescent inks hidden into a color background can be recovered under blue light. The goal is to adapt an already developed Android software package acquiring images embedding hidden watermarks. This requires on the fly image acquisition, real-time sharpness evaluation, focus, and appropriate thresholding to recover the watermark.

Reference:
R. Rossier, R.D. Hersch, Hiding patterns with daylight fluorescent inks. Proc. IS&T/SID’s 19th Color Imaging Conference, San Jose, CA, USA, November 7-11, 2011, pp. 223-228, see http://lsp.epfl.ch/colorpublications

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisits:
– basic knowledge of image processing / computer vision
– coding skills in Matlab (and possibly Java Android)

Level: BS, MS semester or master project

Supervisor:
Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, tel 078 664 36 44

Prof. hon Prof. hon. Roger D. Hersch, INM034, rd.hersch@epfl.ch, cell: 077 406 27 09


 

Recovery of a tiled watermark on an Android smartphone

Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale or color images. The current project aims at carrying out the same recovery, but for a watermark that is tiled.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab and/or Java Android

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch,
, tel 078 664 36 44
Prof. Roger D. Hersch, INM034,
rd.hersch@epfl.ch, cell: 077 406 27 09


 

Interactive recovery of watermarks on an Android smartphone

Startup company Innoview Sàrl has developed software to recover by smartphone a watermark by superposing a revealer on top of a base image that is obtained by camera acquisition. The project aims at making this software interactive in order to show the revealed watermark to the user.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing
– basic coding skills in Matlab and Java Android

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch,
, tel 078 664 36 44
Prof. Roger D. Hersch, INM034,
rd.hersch@epfl.ch, cell: 077 406 27 09


Natural Face Editing with Deep Learning

Natural face editing is a fun application (consider FaceAPP[1] for example): you can add/remove glasses, add/remove makeup, add/remove a smile, etc. Traditional face editing methods often require a number of sophisticated and task specific algorithms to be applied one after the other — a process that is tedious, fragile, and computationally intensive. As recent network architectures (for example, CycleGAN[3]) is able to perform image-to-image translation. In this project, we want to build a framework for natural face editing based on these network architectures, and potentially improve their visual results.

In this project you will:
– Explore how to collect/create large dataset for deep learning tasks
– Research and experiment on different network architectures
– Research on evaluation of visual results

References:
[1] Faceapp
[2] Zhixin Shu , Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shechtman, Dimitris Samaras: Neural Face Editing with Intrinsic Image Disentangling
[3] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Deliverables: code and report

Prerequisites: experience in some deep learning framework, image processing

Level: master

Supervisor: Ruofan Zhou (firstname.lastname@epfl.ch)


Self-identifying corners for automatic joint calibration of a projector-camera pair

Synopsis:
3D sensing is gaining momentum with the increase of computational power, and with the possibility to display and fabricate the results with high fidelity. Structured light (SL) systems are among the most commonly used for 3D object scanning because they can be built by using off-the-shelf hardware components. However, they do require a geometric calibration of the projector-camera pair.

Geometric camera calibration methods rely on capturing images of a flat checkerboard pattern in different orientations, and identifying the corners in the captured images [1]. Furthermore, for methods where the corner detection is performed automatically, we are expected to see all corners in order to determine the orientation of the pattern. Joint projector-camera calibration is performed in a similar manner, by capturing a printed and a projected pattern in the same image [2, 3].

In this project, we want to investigate pattern designs that allow us to uniquely identify all corners in the pattern by inspecting their surrounding squares. To do this we will look into pattern squares that carry identification information, similar to QR-codes [4]. We will also investigate whether we can leverage the different color channels of the projector and the camera to propose new designs. The project will include hands-on work with a projector-camera pair.

References:
[1] Zhang, Zhengyou. “A flexible new technique for camera calibration.” IEEE Transactions on pattern analysis and machine intelligence, 2000.
[2] Ashdown, Mark, and Yoichi Sato. “Steerable projector calibration.” IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2005.
[3] Audet, Samuel, and Masatoshi Okutomi. “A user-friendly method to geometrically calibrate projector-camera systems.” IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2009.
[4] Fiala, Mark, and Chang Shu. “Fully automatic camera calibration using self-identifying calibration targets.” Techn. Rep, 2005.

Deliverables: Report and running prototype.

Prerequisites:
– knowledge of image processing / computer vision / OpenCV
– coding skills in Matlab and C/C++

Level: MS semester project

Type of work: 40% research and 60% implementation

Supervisor: Marjan Shahpaski (firstname.lastname@epfl.ch)