Spring 2018 ‒ IVRL ‐ EPFL

Next Big Hit?

Synopsis:
Are the most popular songs true works of art, only achievable by the most talented of us, or does a certain pattern that can be engineered exist?

In this project, you will:
Study the hypothesis: song popularity can be predicted. The objective is not to accurately predict the popularity of any song, but to use hypothesis testing statistics to evaluate a machine learning algorithm that you design for popularity prediction. The first part will be to extract song features, from the audio, or even the metadata. The second part will be to study statistical measures specific to hypothesis testing, to confirm or refute the hypothesis, based on your prediction algorithm. The project will be concluded with an interesting twist, that relies on the features we extract in the first phase.
PS: based on progress, the project can be extended to video analysis as well.

References:
http://www.hpl.hp.com/techreports/2005/HPL-2005-149.pdf
https://pdfs.semanticscholar.org/1c6a/a0f196e2a68d3c93fd10b85f084811a87b02.pdf

Deliverables: A popularity prediction algorithm and a statistics-based answer to the hypothesis. An analysis of the results; what is the effect of different features.

Prerequisites: Familiarity with signal processing and some machine learning (a plus but not required: web crawling) (divided across 2 students).

Level: MS semester project

Type of work: 25% implementation, 75% research.

Number of students: 2 (but you can apply individually)

Supervised by: Majed El Helou (majed dot elhelou at epfl), Frederike Dümbgen, Ruofan Zhuo

Exploring Beautiful Scenes by Location (Part II)

Synopsis:
Have you ever seen pictures of magnificent scenes online and really wanted to mark them on your bucket list of places to visit but never knew where the photos were taken? Or did you ever want to browse photos geographically? In this project, you will build on top of an already-developed website that has the earlier mentioned features among others. Your goal will be to design an intuitive front end but mainly to develop a suitable gamification to engage the users and encourage them to upload photos.

In this project, you will:
Improve the design of the front end of the website.
Create gamification on top of the tool to encourage user engagement.

Deliverables: Functional user-friendly web tool with suitable gamification.

Prerequisites: The back end was developed using Scala and the Play Framework, you will have a full documentation for all codes.

Level: MS semester project

Type of work: 75% implementation, 25% research.

Number of students: 1

Supervised by: Majed El Helou (majed dot elhelou at epfl), Ruofan Zhuo

Violence Detection in Movies

Description: The goal of this project is to label violent scenes in a given movie by using convolutional neural networks. By defining the violence, the task of violent scene detection is sub-categorized into detection of certain objects e.g. gun and actions e.g. punching. First input video is segmented into the scenes, then salient frames and actions are detected in the scene. Dataset will be collected from several image databases and search engines.

Tasks:

Understand the literature and state of art
Define the violence labels and create training set.
Build CNN model and train the network.
Validate the model and improve the accuracy.

References:

Demarty, C.H., Ionescu, B., Jiang, Y.G., Quang, V.L., Schedl, M., Penet, C.: Benchmarking violent scenes detection in movies. In: 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2014)

Mu G., Cao H., Jin Q. (2016) Violent Scene Detection Using Convolutional Neural Networks and Deep Audio Features

Dai, Q., Zhao, R.W., Wu, Z., Wang, X., Gu, Z., Wu, W., Jiang, Y.G.: Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning

Deliverables: At the end of the semester, the student should provide a framework that automatically detects violent scenes.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work: 40% research, 60% development and testing

Level: Master

Supervisor: Sami Arpa ([email protected])

Genre Detection in Movies

Description: The goal of this project is to infer genre of each scene in a given movie by using convolutional neural networks. By defining the genre, the task of genre detection is sub-categorized into detection of certain objects e.g. spaceship for science-fiction and actions e.g. laughing for comedy. First input video is segmented into the scenes, then salient frames and actions are detected in the scene. Dataset will be collected from several image databases and search engines.

Tasks:

Understand the literature and state of art
Define the violence labels and create training set.
Build CNN model and train the network.
Validate the model and improve the accuracy.

References:

Sivaraman, K. S., and Gautam Somappa. “MovieScope: Movie trailer classification using Deep Neural Networks.” University of Virginia (2016).

Simões, Gabriel S., et al. “Movie genre classification with convolutional neural networks.” Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016.

Deliverables: At the end of the semester, the student should provide a framework that automatically detects genre of each scene of a given movie.

Prerequisites: Experience in deep learning and computer vision, experience in Python, experience in Keras, Theano, or TensorFlow

Type of work: 40% research, 60% development and testing

Level: Master

Supervisor: Sami Arpa ([email protected])

Rectification of a security image on an Android smartphone

Description:

Document authentication by a smartphone requires the perspective distortion rectification of the captured image. Such a perspective distortion rectification has been implemented on a PC by running matlab. Very good accuracies have been obtained. The project consists now in implementing the proposed rectification method on an Android smartphone, by embedding it into the real-time software for the recovery of hidden watermarks.

Deliverables: Report and running prototype (Android).

Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Java Android

Supervisors:

Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44

Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Synthesis of images incorporating hidden information

Description:
Prototype algorithms have been developed in matlab to hide information into grayscale or color images (“visible images”). The project aims at creating an efficient software library for synthesizing such images on demand, starting from a text or XML description containing:

the name and location of the visible image
the information to be hidden into the visible image
the parameters for the hiding system

In addition, determine the limitations of the embedding system for grayscale and color images. Possibly, consider adding a user interface.

Deliverables: Report and running prototype (Java or C++).

Prerequisites:
– knowledge of image processing
– coding skills in Java and/or C++

Supervisors:

Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44

Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09

Generating Digital Diagrams from Sketches

As engineers, we often find ourselves spending a non-trivial amount of time converting sketched diagrams and figures into digital format. To solve this problem, we aim to convert hand-drawn sketches (including shapes, texts, arrows, etc. ) to a digital one (for example, a PPT slide). In this project, the student needs to combine techniques such as optical character recognition[1] and shape recognition algorithms[2] along with own developments to make an effective program for conversion of hand-drawn sketches to a digital diagram.

In this project you will:

Research and experiment on OCR and shape recognition algorithms
Research on image alignment
Explore the APIs of diagram softwares

References:
[1] https://github.com/garnele007/SwiftOCR
[2] T. Hammond, D. Logsdon, J. Peschel, J. Johnston, P. Taele, A. Wolin, and B. Paulson, “A sketch recognition interface that recognizes hundreds of shapes in course-of-action diagrams

Deliverables: code and report

Prerequisites: image processing, machine learning, python/ c++/ matlab

Level: bachelor/master project

Supervisor: Ruofan Zhou ([email protected])

A human-friendly programming language for art historians

Synopsis: Image processing applications are important for humanities scholars of visual culture, such as art historians. However, they largely use stock user-interface applications like Photoshop or GIMP. These have severe limits for humanities scholars (they’re usually based on single images not collections – they offer processes that are made for enhancing photographs, not studying images). We would like to make a very basic programming language, with limited functionality, that might be used by humanities scholars or school-children: something on a similar level as Sonic Pi [1] for music, but with less functionality. This might well be based on the Halide language for computational photography [2], which has also been used to teach computational photography at MIT [4]; or it might be in python [4] or java [5]. This high-level domain-specific-language will have quite limited functionality, but bring important computing capacity for humanities research and education.

If successful, this will be used in teaching for humanities scholars, and result in a paper.

Tasks:
1 – Initial designs, drawing up wish-list of suitable image processing functions
2 – Choice of existing toolbox to draw upon
3 – First prototype of language examples
4 – Write first parser and example code
5 – Draw up an example tutorial

References:
[1] Sam Aaron, http://sonic-pi.net/
[2] Halide Language http://halide-lang.org/#gettingstarted
[3] Fredo Durant, MIT 6.815/6.865 Digital & Computational Photography https://stellar.mit.edu/S/course/6/sp15/6.815/index.html
[4] Scikit Image Python toolbox http://scikit-image.org/
[5] ImageJ https://imagej.nih.gov/ij/

Deliverables: Report, basic project website, codebase, tutorial(s)

Prerequisites: command of a suitable language, preferably something like Python and/or C++. If you’ve ever written a parser before, even better.

Level: Bachelor or (preferably) Master semester project or Masters thesis project

Type of work: 40% research, 60% implementation

Supervisor: Leonardo Impett ([email protected])

Multi-spectral image and video dataset

Synopsis:

This project aims at creating a dataset of multi-spectral (infrared/near-infrared/color) images. For that, we will setup multiple cameras rig and shoot. The first phase of the project will be identifying interesting classes of images (for example daylight, night, fog…) and gathering images for each class.

The second phase is about building an application with this set. I list here some options that can be done:

Registration: Transforming the different images into one coordinates system to help later tasks such as merging them together, or tracking objects in color video using infrared video
Image fusion: Merging multiple images to create a result that contains more information than each one alone
Time warping: Usually IR cameras have a lower frame rate than color cameras. How to synchronize the streams from both cameras so that the viewer doesn’t perceive any temporal registration issues (The objects in the color image have moved but they are still in the same location in the infrared)
Infrared colorization: Transferring the color properties from the color image to the other ones

In this project you will:

1 – Capture a set of multi-spectral images under different conditions
2 – Build an application using the captured images as described above

References:

[1] Image fusion example: https://pdfs.semanticscholar.org/6d53/fe0516d9785739fce67de994839e5547e9b2.pdf
[2] Colorization example: https://arxiv.org/pdf/1604.02245.pdf
[3] Registration example: https://pervasive.uni-klu.ac.at/BR/pubs/2014/Yahyanejad_ISPRS2014.pdf

Deliverables: Dataset + Application

Prerequisites: Any language to work with, curiosity and interest in photography

Level: Preferable bachelors project

Type of work: 70% implementation, 30% research

Supervisor: Fayez Lahoud ([email protected])

Recovering reflectance spectra from camera responses

Synopsis:
In this project, we will design and implement a system that can recover the reflectance spectra of printed colors. To that end, we will use a standard RGB or RGBN camera and a spectrophotometer for assembling the training and testing datasets. Based on the training data, we will determine a mapping between the low dimensional output of the camera and the high dimensional output of the spectrophotometer. For more information, please refer to [1].

References:
[1] Valero, Eva M., et al. “Recovering spectral data from natural scenes with an RGB digital camera and colored filters.” Color Research & Application, 2007.

Deliverables: Report and running code.

Prerequisites:
– knowledge of image processing
– coding skills in Matlab or Python

Level: Bachelor semester project

Type of work: 25% research and 75% implementation

Supervisor: Marjan Shahpaski ([email protected])

Client-server system for authentication by smartphones

Synopsis:
Thanks to their incorporated camera, smartphones are often used for the authentication of individuals, documents or products. The goal of this project is to create a framework with libraries both on the smartphone and on the Windows server allowing the smartphone to send information to a PC server and the server to receive that information, validate it and reply by sending a positive or negative acknowledgement.

The student will survey the different means for communicating over the Internet between a smartphone and a PC (communication by sockets, http server, JSON, etc..) and develop the most suitable implementation. A demonstration environment consisting in a server validating a series of alphanumerical codes transmitted from the smartphone and displayed in a web page will complete the project.

For security, if time allows, try also to ensure that the communication is encrypted.

Deliverables: Report and running prototype (smartphone under Android and PC under Windows).

Prerequisits:
– coding skills: Java on Android and Java on PC.

Level: BS or MS semester project

Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44
Prof. hon. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09