A Large-Scale Multi-Lingual Color Thesaurus

A. Lindner; B. Z. Li; N. Bonnier; S. Süsstrunk

2012. IS&T/SID 20th Color and Imaging Conference (CIC) , Los Angeles, California, USA , November 12 - 16, 2012. p. 30-35.

We present a color thesaurus with over 9000 color names in ten different languages. Instead of using conventional psychophysical experiments, we use a statistical framework that is based on search results from Google Image Search. For each color name we compute a significance distribution in CIELAB space whose maximum indicates the location of the color name in CIELAB. A first analysis discusses the quality of the estimations in the context of human language. Further, we conduct an advanced analysis supporting our choice to use a statistical method. Finally, we demonstrate that a color name mainly depends on the chromatic values and varies more along the lightness axis.

Example color estimations

Overview of 50 color names in ten languages. The samples are sorted by increasing hue angle of the English term from left to right.


Data acquisition

We used the 950 English color names derived in the XKCD study and translated them to nine other languages, which are Chinese, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish, respectively. all translations were done by native speakers with good command of English.

Using Google Image Search we acquired for each color name in each language the first one hundred images from the search result. The search query was the “color name” in quotes plus the word color in the respective language. Two example queries are “cloudy blue”+color and “bleu nuageux”+couleur for English and French, respectively.

Statistical analysis

We carry out a statistical test that assesses for each color bin whether images associated with a specific color name are likely to have a significantly higher (lower) bin count. This is reflected by a positive (negative) standardized significance value as shown in the figure below.

The above Figures show the z value distributions for two color names (pink, English and green, Chinese) in a 3-dimensional heat map. The maximum is located at the crossing of the three orthogonal planes. The homogeneous dark areas at the plane borders are out-of-gamut values. At the bottom, we show the histogram bin colors for the constant L plane through the maximum value for a better orientation in CIELAB space.