Available Projects – Spring 2020

Robust Binary Network

Description:

This project will combine model robustness with parameter binarization. We will investigate the robustness of a specific kind of network where all parameters are binary. How to train binary networks in a non-adversarial environment is well studied in recent years [1, 2]. For non-binary networks, Project Gradient Descent (PGD) [3] is a straightforward but empirically effective method to obtain robust models. In this project, we will study the robustness property of binary networks. We will further design algorithms to train robust binary networks. (Full description is available in this document.)

References:

[1] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binarycon- nect: Training deep neural networks with binary weights during propagations. NIPS 2015.

[2] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural net- works with weights and activations constrained to+ 1 or-1. 2016.

[3] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ICLR 2018.

Deliverables:

Report. Reproducible code. Possible paper submission.

Prerequisites:

Mathematical foundations (calculus, linear algebra). Optimization (gradient descent, primal-dual method). Deep learning.

Level:

MS semester project. (Spring 2020)

Type of work:

20% literature review, 50% research, 30% development and testing.

Supervisor: Chen Liu


Adversarial Training and Loss Landscape

Description:

Modern deep learning systems are known to be vulnerable to adversarial attacks. Small not well-designed adversarial perturbation can make the state-of-the-art model predict wrong label with very high confidence. Fast Gradient Sign Method (FGSM) [1] and Projected Gradient Descent (PGD) [2] are two effective method to obtain robust models against adversarial attacks. In this project, we study the validity and strength of FGSM-based and PGD-based adversarial training. Furthermore, we will take a look at the loss landscape of training objective in normal training, FGSM-based training and PGD-based training. (Full description is available in this document.)

References:

[1] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ICLR 2014.

[2] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ICLR 2018.

Deliverables:

Report. Reproducible code. Visualization of loss landscape studied.

Prerequisites:

Mathematical foundations (calculus, linear algebra). Gradient descent. Deep learning.

Level:

BS semester project. (Spring 2020)

Type of work:

20% literature review, 50% research, 30% development and testing.

Supervisor: Chen Liu


Joint unsupervised video registration and fusion

Description:
This project targets joint registration and fusion of two different imaging modalities–namely, RGB and Infrared (IR). While RGB is related to human perception, IR is related to the heat. We present a challenging dataset of RGB and IR video pairs. In this context, the “video registration” aims to align two videos of the same scene and the “video fusion” is the process of bringing all the essential information from two videos to a single video. Therefore, the main goal of this task is to fuse the IR and RGB videos of the same scene. Specific to this task, classical methods cannot achieve satisfactory performance since the camera pairs are slightly moving during video capture. Also, labelling the dataset is costly and impractical. As a result, we target an unsupervised deep-learning-based solution to this problem. Interested students are encouraged to have a look at related works that include [1], [2], and [3].

References:

[1] An Unsupervised Learning Model for Deformable Medical Image Registration, CVPR, 2018

[2] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV, 2017

[3] Fast and Efficient Zero-Learning Image Fusion, arXiv, 2019

Deliverables:
Report and running prototype.

Prerequisites:
Basic image processing knowledge and deep learning.

Level:
MS semester project. (Spring 2020)

Type of work:
50% research, 50% development and testing.

Supervisor:
Hakki Can Karaimer


Unsupervised deep-learning-based video registration

Description:
In the context of computer vision and image processing, video registration refers to aligning two videos of the same scene. This project targets registration of two different imaging modalities–namely, RGB and Infrared (IR). While RGB is related to human perception, IR is related to heat. We present a challenging dataset of RGB and IR video pairs. Specific to this task, classical methods cannot achieve satisfactory performance since the camera pairs are slightly moving during video capture. Also, labelling the dataset is costly and impractical. Therefore, we target an unsupervised deep-learning-based solution to this problem. Interested students are encouraged to have a look at related works that include [1] and [2].

References:

[1] An Unsupervised Learning Model for Deformable Medical Image Registration, CVPR, 2018

[2] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV, 2017

Deliverables:
Report and running prototype.

Prerequisites:
Basic image processing knowledge and deep learning.

Level:
BS semester project. (Spring 2020)

Type of work:
50% research, 50% development and testing.

Supervisor:
Hakki Can Karaimer


Deep Learning Image Restoration

Description:
Image restoration with machine learning has attracted a lot of research interest with the recent advancements in deep convolutional networks. A multitude of different methods have been proposed for super resolution, deblurring, denoising etc. Each have their limitations and drawbacks that make them bad candidates for generalization and real-world use. In this project, you will be inspecting a set of state-of-the-art image restoration models to analyze them with a thorough empirical study. The focus will be on joint super resolution and denoising, and you can (optionally) implement an extension idea that we can propose to you.

Example methods to run:
https://github.com/cszn/SRMD
https://github.com/cszn/IRCNN
https://github.com/cszn/DPSR

Deliverables:
Reproducible codes for all experiments. Experimental results with proper visualization to draw insights from your research findings.

Prerequisites:
Basic image processing knowledge, some familiarity with deep learning.

Level:
MS semester project.

Type of work:
60% implementation, 40% research.

Supervisors:
Majed El Helou, Ruofan Zhou.


Caption-Aware Image Retargeting

Description:
Context-aware image retargeting aims to arbitrarily adjust an image aspect ratio while preserving visually salient features. To this end, the salient regions in an image are first estimated, and the image is then retargeted with the saliency map according a desirable aspect ratio. Traditional methods, including deep convolutional neural networks (CNNs) based methods, for this task have been formulated based on the saliency estimation on an image only. Meanwhile, an image caption that describes an image in natural languages would be helpful for estimating the salient regions. In this project, we will propose a novel framework for caption-aware image retargeting that estimates an image saliency through its corresponding caption and uses it for image retargeting. In experiments, we will investigate how much the image retargeting performance is boosted by leveraging the image caption.

References:
[1] Goferman et al., “Context-Aware Saliency Detection,” TPAMI, 2012
[2] Cho et al., “Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting,” ICCV, 2017
[3] Wang et al., “Learning to Detect Salient Objects with Image-level Supervision,” CVPR, 2017
[4] Zeng et al., “Multi-source Weak Supervision for Saliency Detection,” CVPR, 2019

Deliverables: Report, running prototype, and research paper if possible.

Prerequisites: Experience in computer vision, machine learning, and especially deep learning.

Level: MS semester project (potentially BS).

Type of Work: 50% research, 50% development and testing.

Supervisor: Seungryong Kim


Eye Tracking for Saliency Estimation in Comics

Description: 

Visual saliency refers a part in a scene that captures our attention. Current approaches for saliency estimation use eye tracking data on natural images for constructing ground truth. However, in our project we will perform eye tracking on comics pages instead of natural images. Later, we will use the collected data to estimate saliency in comics domain. In this project, you will work on an eye tracking experiment with mobile eye tracking glasses. 

Tasks:
– Understand the key points of an eye tracking experiment and our setup.

– Conduct an eye tracking experiment according to given instructions. 

– Perform a detailed analysis of collected data by producing heatmaps, scanpaths and histograms.

– Evaluate a state-of-the art saliency estimation model on the collected data and compare the results with existing results on natural images

Deliverables: At the end of the semester, the student should provide the collected data and a report of the work.

Type of work: 20% research, 80% development and testing

References:

 [1] A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” CVPR 2015 workshop on ”Future of Datasets”, 2015.
 [2] Kai Kunze , Yuzuko Utsumi , Yuki Shiga , Koichi Kise , Andreas Bulling, I know what you are reading: recognition of document types using mobile eye tracking, Proceedings of the 2013 International Symposium on Wearable Computers, September 08-12, 2013, Zurich, Switzerland.

 [3] K. Khetarpal and E. Jain, “A preliminary benchmark of four saliency algorithms on comic art,” 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Seattle, WA.

Level: BS semester project

Supervisor: Bahar Aydemir (bahar.aydemir@epfl.ch)


Metadata Prediction from Advertisement Videos

Description: In this project, you will work on a tool for the automatic generation of metadata from commercial videos. Based on a database of ~80000 commercials and using our Deep Learning model for automatic video pattern detection, you will develop a solution to automatically generate relevant keywords for any given keyword.

Tasks:
– Understand the literature and our framework.

– Perform in depth statistical analysis of the database of commercials.
– Implement a solution for metadata on our Deep Learning model.
– Develop a testing solution and test the model on a case study.

Deliverables: At the end of the semester, the student should provide a framework for automatic metadata predictor.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow. Experience in statistical analysis.

Type of work: 50% research, 50% development and testing

References:
– Harper, F. Maxwell, and Joseph A. Konstan. “The movielens datasets: History and context.” Acm transactions on interactive intelligent systems (tiis) 5.4 (2016): 19.

– Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1725-1732

Level: Master

Supervisor: Sami Arpa (sami.arpa@epfl.ch)


Deep Learning Based Model for Movie Performance Prediction

Description: In this project, you will work on the development of machine learning models for the prediction of movie performance on the streaming platforms. Based on our database of ~20000 movies trailers, ~9000 movie scripts/transcripts and using our Deep Learning model for automatic video and text genre detection, you will develop a solution to predict the performance of movies on the streaming platforms by combining video/script analysis and features extracted from network representations of the Internet Movie Database.

Tasks:
– Understand the literature and our framework.

– Perform in depth statistical analysis of our movie database.
– Implement and test different machine learning approach for movie performance prediction.
– Test the model on case studies.

Deliverables: At the end of the semester, the student should have implemented and tested machine learning models for movie performance prediction.

Prerequisites: Experience in deep learning and machine learning, experience in Python. Experience in statistical analysis.

Type of work: 50% research, 50% development and testing

References:
– M. Ghiassi, David Lio, Brian Moon, Pre-production forecasting of movie revenues with a dynamic artificial neural network, Expert Systems with Applications, Volume 42, Issue 6, 2015, Pages 3176-3193

– Simonoff, J. S. and Sparrow, I. R. Predicting movie grosses: Winners and losers, blockbusters and sleepers. In Chance, 2000.

Level: Master

Supervisor: Sami Arpa (sami.arpa@epfl.ch)


Deep Learning Based Movie Recommendation System

Description: In this project, you will work on the recommendation system of Sofy.tv. Sofy.tv uses a recommendation system based on the recipes of the movies. These recipes are found through our multi-channel deep learning system. The goal of this project is to improve the recommendations by finding the best fits for the user taste..

Tasks:
– Understand the literature and our framework
– Revise our taste clustering system
– Revise our matchmaking system between the users and films.
– Test the revised model.

Deliverables: At the end of the semester, the student should provide an enhanced framework for the recommendation.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow. Basic experience in web programming.

Type of work: 50% research, 50% development and testing.

Level: Master

Supervisor: Sami Arpa (sami.arpa@epfl.ch)


Estimating image depths in new domains

Description: In this project, you will research the existing literature on weakly-supervised or unsupervised depth estimation and build a model for estimating the depth maps in Comic images. Traditionally, there have been a myriad of depth estimation techniques that have been applied to real world images and are found to work considerably well. However, it becomes challenging to achieve the same results when applied to other image domains such as comics. A possible solution to this problem is domain adaptation where you may use a pretrained model on a natural image dataset and transfer it to a comic dataset. A better solution is to develop a weakly supervised technique for depth estimation in the comics domain. A good starting point is [3].

In this project, you will propose a framework for translating the natural images to comic domain [1-2] and propose a depth estimation network on the comics domain in a weakly-supervised manner. You may contact any of the supervisors at any time should you want to discuss the idea further.

Tasks:
– Understand the literature and our framework.

– Implement an existing state-of-the-art (SOTA) depth estimation model trained on natural image.
– Develop a method to translate the natural images to comics images using the depth maps of natural images
 – Compare the performances of existing SOTA on natural images, generated images and comics images

Deliverables: At the end of the semester, the student should provide a framework for the depth estimation in comic domain along with a project report based on this work.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow. Experience in statistical analysis.

Type of work: 50% research, 50% development and testing

References:
[1] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”, in IEEE International Conference on Computer Vision (ICCV), 2017. 
[2] Youssef A. Mejjati, Christian Richardt, James Tompkin, Darren Cosker, and Kwang In Kim, “Unsupervised Attention-guided Image-to-Image Translation”, in Advances in  Neural Information Processing Systems (NIPS), 2018.

[3] Andrea Pilzer, Dan Xu, Mihai Marian Puscas, Elisa Ricci and Nicu Sebe, “Unsupervised Adversarial Depth Estimation using Cycled Generative Networks”, in Proceedings of the 6th International Conference on 3D Vision (3DV 2018). IEEE, 2018.
[4] Ziyu Zhang, Alexander G. Schwing, Sanja Fidler, and Raquel Urtasun. “Monocular Object Instance Segmentation and Depth Ordering with CNNs”, in IEEE International Conference on Computer Vision (ICCV), 2015. 

Level: Master

Supervisor: Deblina Bhattacharjee (deblina.bhattacharjee@epfl.ch), Seungryong Kim (seungryong.kim@epfl.ch)


Unsupervised Saliency guided Image-to-Image Translation
 
 Description: Visual saliency refers a part in a scene that captures our attention.
 Conventionally, many saliency detection techniques, including deep convolutional neural networks (CNNs) based approaches, have been developed on natural images. However, these existing techniques perform considerably worse on other imaging domains such as cartoons, artwork, and comics due to the lack of annotated saliency maps on these images.
In this project, we will propose a framework to translate the natural images with corresponding saliency maps to the comics domain with saliency-guided image-to-image translation method. We will use them to learn the saliency detection networks on the comics domain in a weakly-supervised manner.

 Tasks:
 – Understand the literature and state-of-art
 – Implement state-of-the-art image-to-image translation algorithms
 – Develop a method to translate the comics images with saliency guidance
 – Compare the performances of existing saliency algorithms on natural
 images, generated images and comics images
 
 References:
 [1] Khetarpal, Khimya & Jain, Eakta.  A preliminary benchmark of four saliency algorithms on comic art. (2016).
 [2] Mejjati, Y. A., Richardt, C., Tompkin, J., Cosker, D., & Kim, K. I. Unsupervised Attention-guided Image-to-Image Translation. In Advances in Neural Information Processing Systems (2018).
 [3] Hoffman, J., Tzeng, E., Park, T., Zhu, J. Y., Isola, P., Saenko, K. & Darrell, T.  Cycada: Cycle-consistent adversarial domain adaptation. (2017).

Deliverables: At the end of the semester, the student should provide a framework that provides the translated images and a report of the work.

Prerequisites: Experience in machine learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work: 60% research, 40% development and testing

Level: MS semester project

Supervisor: Bahar Aydemir (bahar.aydemir@epfl.ch), Seungryong Kim (seungryong.kim@epfl.ch)


Instance-level image segmentation

 Synopsis: Traditional methods for instance-level image segmentation have provided
 limited ability to deal with other imaging domains such as comics, due
 to the lack of annotated data on these domains. In this project, we will
 implement the state-of-the-art methods for this task and apply them on
 comics datasets. In addition, we will propose a weakly- or un-supervised
 instance-level image segmentation method that leverages a domain
 adaptation technique.
 
 References:
 [1] P. O. Pinheiro, R. Collobert, and P. Doll´ar, “Learning to segment
 object candidates,” NIPS, 2015.
 [2] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep
 Features for Discriminative Localization.” CVPR, 2016.
 [3] A. Rozantsev, M. Salzmann, and P. Fua, “Residual parameter transfer
 for deep domain adaptation,” CoRR, 2017.
 
 Deliverables: Report and reproducable implementations
 
 Prerequisites: Experience with deep learning with Pytorch or another
 framework, computer vision
 
 Level: MS semester project
 
 Type of work: 60% research, 40% implementation
 
 Supervisors: Ihsan Utlu, Seungryong Kim


Saliency Prediction for Architectural Scenes

Description:
Context-aware image retargeting aims to arbitrarily adjust an image aspect ratio while preserving visually salient features. To this end, the salient regions in an image are first estimated, and the image is then retargeted with the saliency map according a desirable aspect ratio. Traditional methods, including deep convolutional neural networks (CNNs) based methods, for this task have been formulated based on the saliency estimation on an image only. Meanwhile, an image caption that describes an image in natural languages would be helpful for estimating the salient regions. In this project, we will propose a novel framework for caption-aware image retargeting that estimates an image saliency through its corresponding caption and uses it for image retargeting. In experiments, we will investigate how much the image retargeting performance is boosted by leveraging the image caption.

References:
[1] M. Assens Reina, X. Giro-i-Nieto, K. McGuinness, and N. E. O’Connor, “SaltiNet: Scan-path prediction on 360 degree images using saliency volumes,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2331–2338.
[2] R. Monroy, S. Lutz, T. Chalasani, and A. Smolic, “SalNet360: Saliency Maps for omni-directional images with CNN,” Signal Process. Image Commun., 2018.
[3] M. Startsev and M. Dorr, “360-aware saliency estimation with conventional image saliency predictors,” Signal Process. Image Commun., vol. 69, pp. 43–52, 2018.
[4] Y. Zhu, G. Zhai, and X. Min, “The prediction of head and eye movement for 360 degree images,” Salient360 Vis. Atten. Model. 360° Images, vol. 69, pp. 15–25, Nov. 2018.
[5] J. Ling, K. Zhang, Y. Zhang, D. Yang, and Z. Chen, “A saliency prediction model on 360 degree images using color dictionary based sparse representation,” Salient360 Vis. Atten. Model. 360° Images, vol. 69, pp. 60–68, Nov. 2018.

Deliverables: Reproducible codes for saliency prediction in VR architectural scenes, Report on the experimental results with proper visualization and insights from your research findings.

Prerequisites: Familiarity with deep learning, experience in Python, experience in statistical analysis.

Level: MS semester project (potentially BS).

Type of Work: 50% research, 50% development and testing.

Supervisor: Seungryong Kim (seungryong.kim@epfl.ch), Bahar Aydemir (bahar.aydemir@epfl.ch), and Caroline Karmann (caroline.karmann@epfl.ch) from LIPID (ENAC)


Microscopy Imaging

Description:
Microscopy imaging is crucial for medical research. Multiple imaging techniques are available with different advantages, for capturing different material properties at a microscopic scale. The goal of this project is to collect a microscopy dataset. You will then perform a set of evaluation experiments to analyze the performance of different algorithms on your dataset.

Deliverables:
Microscopy image dataset, and benchmarking on a set of available algorithms.

Prerequisites:
Basic image processing knowledge. Deep learning experience could be a plus.

Level:
MS semester project (potentially BS).

Type of work:
60% implementation, 40% research.

Supervisors:
Majed El Helou, Ruofan Zhou.


Recovery of a watermark hidden within dispersed line segments

Description: Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale image that uses line halftones to display simple graphical elements such as a logo. Now the software has been extended to hide the watermark into dispersed line segments. Adapt this software to work within an Android smartphone. Tune and optimize the available parameters.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab and/or Java Android

Level: BS or MS semester project or possibly master project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, rd.hersch@epfl.ch, cell: 077 406 27 09


Recovery of watermarks on a curved surface by an Android smartphone

Startup company Innoview Sàrl has developed software to recover by smartphone a watermark by superposing a software revealer on top of a base image that is obtained by camera acquisition. The project aims at extending this project in order to recover watermarks that are printed on a curved surface such as on the label of a bottle of wine.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:
– knowledge of image processing
– basic coding skills in Matlab and Java Android

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, rd.hersch@epfl.ch, cell: 077 406 27 09


Epson Printer Driver for synthesizing hidden codes

Description: Startup company Innoview Sàrl has developed software to recover by smartphone a hidden watermark printed on an desktop Epson printer. Special Epson P50 printer driver software enables printing the hidden watermark.  That Epson P50 printer is now replaced by new types of Epson printers that require a modified driver software. The project consists in understanding the previous driver software and at modifying it so as to be able to drive the new Epson printer.  Possibly, reverse engineering will be necessary to obtain some of the  new non documented driver codes.

Deliverables: Report and running prototype (C, C++ or Matlab).

Prerequisites:
– knowledge of image processing
– basic coding skills in C, C++ or Matlab

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, rd.hersch@epfl.ch, cell: 077 406 27 09


Reconstruction of 3D objects with planar mirrors

Synopsis:
3D reconstruction of objects has traditionally been done with a set of two or more cameras. We would like to instead replace the additional cameras with mirrors, allowing us to capture a scene from several viewpoints within a single image by using a single camera.

In this project, you will implement a system capable of 3D object reconstruction. You will start by designing and building a box composed of four mirrors that will later be placed in front of the camera. The reflections from this box of mirrors will provide us with four additional views in addition to the central view [1]. The central view will be projected in the center of the camera image, and the additional views will be located on the sides of the camera image. Like in traditional stereo systems, you will calibrate the setup, rectify the side, top and bottom views and match image features along epipolar lines across the views [2]. Finally, you will perform triangulation to reconstruct the 3D scene.

References:
[1] Joshua Gluckman and Shree K. Nayar. “Rectified catadioptric stereo sensors.” IEEE transactions on pattern analysis and machine intelligence (2002).
[2] Joshua Gluckman and Shree K. Nayar. “Catadioptric stereo using planar mirrors.” International Journal of Computer Vision (2001).

Deliverables: Report and running prototype.

Prerequisites:
 – knowledge of computer vision
– coding skills in Matlab and Python or C/C++

Level: MS semester project

Type of work: 50% implementation and 50% research

Supervisor: Marjan Shahpaski (firstname.lastname@epfl.ch)


Simulation of a radial imaging system with Mitsuba for BSDF capture

Synopsis: Radial imaging systems capture a scene from a large number of viewpoints within a single image, using a curved mirror and a camera. These systems can recover scene properties such as scene geometry, surface texture, and BSDF [1].

In this project, you will implement a system that simulates an image captured by such a setup. You will build a simple 3D scene that contains the curved mirror and the camera in Blender. Then you will modify the Mitsuba physically based renderer [2][3] to be able to work with empirical (measured) BSDFs.

To validate if the system works correctly, you will assign a measured BSDF to an object in the 3D scene. Once the object is illuminated from a known incident direction, the outgoing light’s intensity will vary according to the object’s BSDF. A discrete set of outgoing angles will then be captured by the camera, after they undergo multiple reflections from the curved mirror’s walls. The set of outgoing angles, together with the light intensity in each angle, will let us reconstruct the object’s BSDF.

References:
[1] Sujit Kuthirummal and Shree K. Nayar. “Multiview radial catadioptric imaging for scene capture.” ACM Transactions on Graphics (2006).

[2] http://www.mitsuba-renderer.org
[3] https://github.com/mitsuba-renderer/mitsuba

Deliverables: Report and running prototype.

Prerequisites:
– knowledge of computer vision

– coding skills in Matlab and Python or C/C++

Level: BS/MS semester project

Type of work: 50% implementation and 50% research

Supervisor: Marjan Shahpaski (firstname.lastname@epfl.ch)


Temporally Coherent Loss for Video Super-Resolution

Description: In video super-resolution, the spatio-temporal coherence between, and among the frames can be exploited appropriately for accurate prediction of the high-resolution frames. Although many state-of-the-art super-resolution methods utilize temporal information in their input to boost their performance, they still favor simpler norms on single frame for the loss function, which might leads to some undesirable flickering and artifacts on the results. In this project, you will cooperate the state-of-the-art video super-resolution networks with temporally coherent loss, to see if it could help to remove temporal artifacts and improve the perceptual quality of the video.

Tasks:

  • Literature review on convolutional neural network based methods for video super-resolution
  • Collect datasets
  • Literature review on video quality assessment
  • Implement and evaluate temporally coherent loss for video super-resolution
  • (optional) implement and evaluate temporally coherent GAN for video super-resolution

Deliverables: Report and implementation of deep convolutional networks

Prerequisites: experience/interests in convolutional neural networks, experience/interests in image and video processing

References:

  1. Younghyun Jo, et al, “Deep Video Super-Resolution Using Dynamic Upsampling Filters Without Explicit Motion Compensation”, CVPR2018
  2. Ce Liu et al, “A Bayesian Approach to Adaptive Video Super-Resolution”, CVPR 2011
  3. Wang, Zhou et al, “Video quality assessment based on structural distortion measurement”. Signal processing: Image communication, 2014

Level: MS semester project

Type of work: 50% implementation, 50% research

Supervisor: Ruofan Zhou (ruofan.zhou@epfl.ch)


Text Image Downsampling and Super-Resolution via CNNs

Description: When compressing or downsampling the text images, it is important to keep the readability of the text. On the other hand, when applying super-resolution (which is an inverse operation of downsampling) on text images, it is also important to increase the readability rather than just increase the sharpness of the image. The goal of this project is to experiment on different network architectures and loss functions for text image downsampling and super-resolution, and evaluate the results according to the readability.

Tasks:

  • Literature review on convolutional neural network based methods for image compression and super-resolution
  • Collect datasets and data augmentation
  • Implementation some network architectures and loss functions
  • Literature review on character/text recognition
  • Evaluation on the implemented networks

Deliverables: Report and implementation of deep convolutional networks

Prerequisites: experience/interests in convolutional neural networks, experience/interests in image processing

References:

  1. Zhang, Haochen, Dong Liu, and Zhiwei Xiong, “CNN-based text image super-resolution tailored for OCR”, IEEE Visual Communications and Image Processing, 2017
  2. Peyrard, Clement, et al. “ICDAR2015 competition on text image super-resolution”, International Conference on Document Analysis and Recognition, 2015
  3. Howard, Paul G. “Lossless and lossy compression of text images by sof pattern matching”, Proceddings of Data Compression Conference-DCC, 1996s

Level: MS semester project (potentially BS)

Type of work: 60% implementation, 40% research

Supervisor: Ruofan Zhou (ruofan.zhou@epfl.ch)


Longitudinal chromatic aberration assessment tool

Description: The goal of this project is to evaluate longitudinal chromatic aberration from a single photo in the presence of lateral chromatic aberration. You will be acquiring a set of photos of printed edges placed at different depths in the scene and the objective is to remove the lateral chromatic aberration across the image to be able to use its full width for assessing longitudinal chromatic aberration, which you will work on in the second part of the project. Your final results should then be compared to prior work results to evaluate the matching.

Tasks:
Review the literature; PSF estimation protocols, lens assessment and chromatic aberration.

Capture a dataset of edge images.
Remove lateral chromatic aberration across the image.
Evaluate longitudinal chromatic aberration from a single image where the lateral chromatic aberration was corrected.
Evaluate the results on different lenses/cameras and compare to prior work.

References: http://www.imatest.com/docs/sfr_chromatic/

https://www.dxomark.com/dxomark-lens-camera-sensor-testing-protocol/

Deliverables: Report, dataset, implementation codes for lateral chromatic aberration removal and for single-image longitudinal chromatic aberration assessment.

Prerequisites: Comfortable reading codes in MATLAB (and writing), strong mathematical signal processing background, experience with hardware and (professional) image acquisition techniques.

Type of work: 80% research, 20% development and testing

Level: BS

Supervisor: Majed El Helou