Estimating Subjective Image Appeal


Rare is Interesting: Connecting Spatio-Temporal Behavior Patterns with Subjective Image Appeal


The 2nd ACM International Workshop on Geotagging and its Applications in Multimedia at the 21st ACM International Conference on Multimedia, Barcelona, Spain, October 21-25, 2013.

We analyze behavior patterns and photographic habits of the Nokia Mobile Data Challenge (NMDC) participants using GPS and time- stamp data. We show that these patterns and habits can be used to estimate image appeal ratings of geotagged Flickr images.

In order to do this, we summarize the behavior patterns of the individual NMDC participants into rare and repeating events using GPS coordinates and time stamps. We then retrieve, based on both the time and location information from these events, geotagged im- ages and their “view” and “favorite” counts from Flickr. The ap- peal of an image is calculated as the ratio of favorite count to view count. We analyze how rare and repeating events are related to the appeal of the downloaded Flickr images and find that image appeal ratings are higher for events when the NMDC participants also took pictures and also higher for rare events. We thus design new event-based features to rate and rank the geotagged Flickr im- ages. We measure the ranking performance of our algorithm by using the Flickr appeal ratings as ground truth. We show that our event-based features outperform visual-only features, which were previously used in image appeal ratings, and obtain a Spearman’s correlation coefficient of 0.47.


The NMDC Dataset


The NMDC dataset includes smartphone usage information and various sensor data collected in Switzerland during approximately two years. In our analyses, we use over 10 million GPS data points (time, location) that belong to 166 NMDC participants. In addition, the participants took a photograph at 17’000 of these GPS data points. In Figure 1, the GPS density map of the participants and the GPS positions of the collected photographs are overlaid on the map of Switzerland.




Figure 1: Map of over 10 million GPS coordinates (in red) and 17’000 coordinates of photographs taken by all participants (in blue circles) (Image is taken from Google Maps). 

Time- and Location-Based Events

In order to represent the NMDC participants’ behavior patterns,  we use the GPS data points to extract time- and location-based  events. We define the following types of events:


• Event: We define an event as a limited spatial and temporal  interval, in which the GPS data points of an NMDC participant  do not change significantly . We extract over 90’000  events from 10 million GPS data points.


• Rare and Repeating Events: Events are indicated as “rare”,  if there exist less than ten events for the same participant at close-by GPS coordinates. The inverse is true for repeating events.


• Photographically Interesting and Uninteresting Events:   Events are indicated as “photographically interesting” (PI),  if a participant took a photograph during that event. The inverse  is true for “photographically uninteresting” (PU) events.  We detect approximately 3’700 PI events and 86’300 PU  events in the NMDC dataset.



Figure 2: Effect of (left) the days of the week and (right) event repetition on the photograph collection.

The Flickr Dataset

We retrieve over 36’000 geotagged photographs from Flickr. This dataset is formed by downloading all the geotagged photographs within a 100m radius of each event center coordinate and within the NMDC data collection duration of two years. Along with the images, we also get the following Flickr image statistics:


• View Count: The number of people who viewed the image (all retrieved images were viewed more than 20 times).

• Favorite Count: The number of people who added the image to their favorite images list.


These values can be considered as outputs of a psychophysical test. We will treat the ratio of favorite count to view count as the “image appeal rating” of an image.

Figure 3: The average image appeal ratings (left) for PI and PU events (right) for different event repetitions.

Evaluation and Results

In our evaluations, we randomly divide the 36’000 Flickr images into equal-sized training and test partitions. We then create 32 regression trees by randomly selecting (with replacement) 80% of the training examples for every tree. The input of a regression tree is a feature vector, and the ground truth is the image appeal ratings. We train regression trees under three setups: visual features only, event-based features only, and both feature vectors. When we test an image to get a final image appeal rating estimation, we pass the extracted features through all trees and average the regression output. The appeal rating prediction is not directly a curve-fitting problem. Thus, to evaluate the performance of our method, we use t he Spearman’s rank correlation test.


Table 3: Spearmen’s ρ values for individual and combined feature sets averaged over ten experiments.


Spearman’s Coefficient

Extraction Time


0.3488 ± 0.0050

0.1638 s


0.4740 ± 0.0040

0.0043 s


0.4913 ± 0.0033

0.1681 s




Figure 5: Precision – recall curves for different feature types.