Item features play an important role in movie recommender systems, where
recommendations can be generated by using explicit or implicit preferences of
users on traditional features (attributes) such as tag, genre, and cast.
Typically, movie features are human-generated, either editorially (e.g., genre
and cast) or by leveraging the wisdom of the crowd (e.g., tag), and as such,
they are prone to noise and are expensive to collect. Moreover, these features
are often rare or absent for new items, making it difficult or even impossible
to provide good quality recommendations.
In this paper, we show that user's preferences on movies can be better
described in terms of the mise-en-sc\`ene features, i.e., the visual aspects of
a movie that characterize design, aesthetics and style (e.g., colors,
textures). We use both MPEG-7 visual descriptors and Deep Learning hidden
layers as example of mise-en-sc\`ene features that can visually describe
movies. Interestingly, mise-en-sc\`ene features can be computed automatically
from video files or even from trailers, offering more flexibility in handling
new items, avoiding the need for costly and error-prone human-based tagging,
and providing good scalability.
We have conducted a set of experiments on a large catalogue of 4K movies.
Results show that recommendations based on mise-en-sc\`ene features
consistently provide the best performance with respect to richer sets of more
traditional features, such as genre and tag.