Computer Science - Multimedia Publications

The paper presents a novel concept that analyzes and visualizes worldwide fashion trends. Our goal is to reveal cutting-edge fashion trends without displaying an ordinary fashion style. To achieve the fashion-based analysis, we created a new fashion culture database (FCDB), which consists of 76 million geo-tagged images in 16 cosmopolitan cities. Read More

This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. Read More

Dance Dance Revolution (DDR) is a popular rhythm-based video game. Players perform steps on a dance platform in synchronization with music as directed by on-screen step charts. While many step charts are available in standardized packs, users may grow tired of existing charts, or wish to dance to a song for which no chart exists. Read More

The details of an image with noise may be restored by removing noise through a suitable image de-noising method. In this research, a new method of image de-noising based on using median filter (MF) in the wavelet domain is proposed and tested. Various types of wavelet transform filters are used in conjunction with median filter in experimenting with the proposed approach in order to obtain better results for image de-noising process, and, consequently to select the best suited filter. Read More

Teleradiology enables medical images to be transferred over the computer networks for many purposes including clinical interpretation, diagnosis, archive, etc. In telemedicine, medical images can be manipulated while transferring. In addition, medical information security requirements are specified by the legislative rules, and concerned entities must adhere to them. Read More

Steganography is collection of methods to hide secret information ("payload") within non-secret information ("container"). Its counterpart, Steganalysis, is the practice of determining if a message contains a hidden payload, and recovering it if possible. Presence of hidden payloads is typically detected by a binary classifier. Read More

Studies show that refining real-world categories into semantic subcategories contributes to better image modeling and classification. Previous image sub-categorization work relying on labeled images and WordNet's hierarchy is not only labor-intensive, but also restricted to classify images into NOUN subcategories. To tackle these problems, in this work, we exploit general corpus information to automatically select and subsequently classify web images into semantic rich (sub-)categories. Read More

This paper reviews the causes of discomfort in viewing stereoscopic content. These include objective factors, such as misaligned images, as well as subjective factors, such as excessive disparity. Different approaches to the measurement of visual discomfort are also reviewed, in relation to the underlying physiological and psychophysical processes. Read More

The emergence of smart Wi-Fi APs (Access Point), which are equipped with huge storage space, opens a new research area on how to utilize these resources at the edge network to improve users' quality of experience (QoE) (e.g., a short startup delay and smooth playback). Read More

Motion compensation is a fundamental technology in video coding to remove the temporal redundancy between video frames. To further improve the coding efficiency, sub-pel motion compensation has been utilized, which requires interpolation of fractional samples. The video coding standards usually adopt fixed interpolation filters that are derived from the signal processing theory. Read More

Streaming video is becoming the predominant type of traffic over the Internet with reports forecasting the video content to account for 80% of all traffic by 2019. With significant investment on Internet backbone, the main bottleneck remains at the edge servers (e.g. Read More

Progress in Multiple Object Tracking (MOT) has been historically limited by the size of the available datasets. We present an efficient framework to annotate trajectories and use it to produce a MOT dataset of unprecedented size. In our novel path supervision the annotator loosely follows the object with the cursor while watching the video, providing a path annotation for each object in the sequence. Read More

In general, the quality of experience QoE is subjective and context-dependent, identifying and calculating the factors that affect QoE is a difficult task. Recently, a lot of effort has been devoted to estimating the users QoE in order to enhance video delivery. In the literature, most of the QoE-driven optimization schemes that realize trade-offs among different quality metrics have been addressed under the assumption of homogenous populations, nevertheless, people perceptions on a given video quality may not be the same, which makes the QoE optimization harder. Read More

The use of peer-to-peer (P2P) networks for multimedia distribution has spread out globally in recent years. The mass popularity is primarily driven by cost-effective distribution of content, also giving rise to piracy. An end user (buyer/peer) of a P2P content distribution system does not want to reveal his/her identity during a transaction with a content owner (merchant), whereas the merchant does not want the buyer to further distribute the content illegally. Read More

Music auto-tagging is often handled in a similar manner to image classification by regarding the 2D audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstractions. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. Read More

Recently, the end-to-end approach that learns hierarchical representations from raw data using deep convolutional neural networks has been successfully explored in the image, text and speech domains. This approach was applied to musical signals as well but has been not fully explored yet. To this end, we propose sample-level deep convolutional neural networks which learn representations from very small grains of waveforms (e. Read More

Sports data analysis is becoming increasingly large-scale, diversified, and shared, but difficulty persists in rapidly accessing the most crucial information. Previous surveys have focused on the methodologies of sports video analysis from the spatiotemporal viewpoint instead of a content-based viewpoint, and few of these studies have considered semantics. This study develops a deeper interpretation of content-aware sports video analysis by examining the insight offered by research into the structure of content under different scenarios. Read More

The paper presents a novel approach to occlusion handling problem in depth estimation using three views. A solution based on modification of similarity cost function is proposed. During the depth estimation via optimization algorithms like Graph Cut similarity metric is constantly updated so that only non-occluded fragments in side views are considered. Read More

This paper presents a novel method for detection of LSB matching steganogra- phy in grayscale images. This method is based on the analysis of the differences between neighboring pixels before and after random data embedding. In natu- ral images, there is a strong correlation between adjacent pixels. Read More

In this paper, an unsupervised steganalysis method that combines artificial training setsand supervised classification is proposed. We provide a formal framework for unsupervisedclassification of stego and cover images in the typical situation of targeted steganalysis (i.e. Read More

Mobile streaming video data accounts for a large and increasing percentage of wireless network traffic. The available bandwidths of modern wireless networks are often unstable, leading to difficulties in delivering smooth, high-quality video. Streaming service providers such as Netflix and YouTube attempt to adapt their systems to adjust in response to these bandwidth limitations by changing the video bitrate or, failing that, allowing playback interruptions (rebuffering). Read More

Although the protection of ownership and the prevention of unauthorized manipulation of digital images becomes an important concern, there is also a big issue of image source origin authentication. This paper proposes a procedure for the identification of the image source and content by using the Public Key Cryptography Signature (PKCS). The procedure is based on the PKCS watermarking of the images captured with numerous automatic observing cameras in the Trap View cloud system. Read More

Adversarial training was recently shown to be competitive against supervised learning methods on computer vision tasks, however, studies have mainly been confined to generative tasks such as image synthesis. In this paper, we apply adversarial training techniques to the discriminative task of learning a steganographic algorithm. Steganography is a collection of techniques for concealing information by embedding it within a non-secret medium, such as cover texts or images. Read More

In the context of Social TV, the increasing popularity of first and second screen users, interacting and posting content online, illustrates new business opportunities and related technical challenges, in order to enrich user experience on such environments. SAM (Socializing Around Media) project uses Social Media-connected infrastructure to deal with the aforementioned challenges, providing intelligent user context management models and mechanisms capturing social patterns, to apply collaborative filtering techniques and personalized recommendations towards this direction. This paper presents the Context Management mechanism of SAM, running in a Social TV environment to provide smart recommendations for first and second screen content. Read More

HEVC (MPEG-H Part 2 and H.265) is a new coding technology which is expected to be deployed on the market along with new video services in the near future. HEVC is a successor of currently widely used AVC (MPEG-4 Part 10 and H. Read More

We present a data-driven approach that colorizes 3D furniture models and indoor scenes by leveraging indoor images on the internet. Our approach is able to colorize the furniture automatically according to an example image. The core is to learn image-guided mesh segmentation to segment the model into different parts according to the image object. Read More

Today's Internet has witnessed an increase in the popularity of mobile video streaming, which is expected to exceed 3/4 of the global mobile data traffic by 2019. To satisfy the considerable amount of mobile video requests, video service providers have been pushing their content delivery infrastructure to edge networks--from regional CDN servers to peer CDN servers (e.g. Read More

The paper presents quantitative analysis of the video quality losses in the homogenous HEVC video transcoder. With the use of HM15.0 reference software and a set of test video sequences, cascaded pixel domain video transcoder (CPDT) concept has been used to gather all the necessary data needed for the analysis. Read More

Internet-native audio-visual services are witnessing rapid development. Among these services, object-based audio-visual services are gaining importance. In 2014, we established the Software Defined Media (SDM) consortium to target new research areas and markets involving object-based digital media and Internet-by-design audio-visual environments. Read More

Gamification represents an effective way to incentivize user behavior across a number of computing applications. However, despite the fact that physical activity is essential for a healthy lifestyle, surprisingly little is known about how gamification and in particular competitions shape human physical activity. Here we study how competitions affect physical activity. Read More

3D steganalysis aims to identify subtle invisible changes produced in graphical objects through digital watermarking or steganography. Sets of statistical representations of 3D features, extracted from both cover and stego 3D mesh objects, are used as inputs into machine learning classifiers in order to decide whether any information was hidden in the given graphical object. According to previous studies, sets of local geometry features can be used to define the differences between stego and cover-objects. Read More

Limited annotated data is available for the research of estimating facial expression intensities, which makes the training of deep networks for automated expression assessment very challenging. Fortunately, fine-tuning from a data-extensive pre-trained domain such as face verification can alleviate the problem. In this paper, we propose a transferred network that fine-tunes a state-of-the-art face verification network using expression-intensity labeled data with a regression layer. Read More

Inspired by the recent advances of image super-resolution using convolutional neural network (CNN), we propose a CNN-based block up-sampling scheme for intra frame coding. A block can be down-sampled before being compressed by normal intra coding, and then up-sampled to its original resolution. Different from previous studies on down/up-sampling based coding, the up-sampling interpolation filters in our scheme have been designed by training CNN instead of hand-crafted. Read More

In this paper, we study a simplified affine motion model based coding framework to overcome the limitation of translational motion model and maintain low computational complexity. The proposed framework mainly has three key contributions. First, we propose to reduce the number of affine motion parameters from 6 to 4. Read More

This paper proposes a novel advanced motion model to handle the irregular motion for the cubic map projection of 360-degree video. Since the irregular motion is mainly caused by the projection from the sphere to the cube map, we first try to project the pixels in both the current picture and reference picture from unfolding cube back to the sphere. Then through utilizing the characteristic that most of the motions in the sphere are uniform, we can derive the relationship between the motion vectors of various pixels in the unfold cube. Read More

Researchers often summarize their work in the form of scientific posters. Posters provide a coherent and efficient way to convey core ideas expressed in scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. Read More

Feature extraction is a critical component of many applied data science workflows. In recent years, rapid advances in artificial intelligence and machine learning have led to an explosion of feature extraction tools and services that allow data scientists to cheaply and effectively annotate their data along a vast array of dimensions---ranging from detecting faces in images to analyzing the sentiment expressed in coherent text. Unfortunately, the proliferation of powerful feature extraction services has been mirrored by a corresponding expansion in the number of distinct interfaces to feature extraction services. Read More

What we eat is one of the most frequent and important health decisions we make in daily life, yet it remains notoriously difficult to capture and understand. Effective food journaling is thus a grand challenge in personal health informatics. In this paper we describe a system for food journaling called I Ate This, which is inspired by the Remote Food Photography Method (RFPM). Read More

Photos are becoming spontaneous, objective, and universal sources of information. This paper develops evolving situation recognition using photo streams coming from disparate sources combined with the advances of deep learning. Using visual concepts in photos together with space and time information, we formulate the situation detection into a semi-supervised learning framework and propose new graph-based models to solve the problem. Read More

Conventional compressive sensing (CS) attempts to acquire the most important part of a signal directly. In fact, CS avoids acquisition of existed \textit{statistical redundancies} of a signal. Since the sensitivity of the human eye is different for each frequency, in addition to statistical redundancies, there exist \textit{perceptual redundancies} in an image which human eye could not detect them. Read More

The German Broadcasting Archive (DRA) maintains the cultural heritage of radio and television broadcasts of the former German Democratic Republic (GDR). The uniqueness and importance of the video material stimulates a large scientific interest in the video content. In this paper, we present an automatic video analysis and retrieval system for searching in historical collections of GDR television recordings. Read More

We propose a method to generate multiple hypotheses for human 3D pose all of them consistent with the 2D detection of joints in a monocular RGB image. To generate these pose hypotheses we use a novel generative model defined in the space of anatomically plausible 3D poses satisfying the joint angle limits and limb length ratios. The proposed generative model is uniform in the space of anatomically valid poses and as a result, does not suffer from the dataset bias in existing motion capture datasets such as Human3. Read More

The discrete cosine transform (DCT) is the key step in many image and video coding standards. The 8-point DCT is an important special case, possessing several low-complexity approximations widely investigated. However, 16-point DCT transform has energy compaction advantages. Read More

The best summary of a long video differs among different people due to its highly subjective nature. Even for the same person, the best summary may change with time or mood. In this paper, we introduce the task of generating customized video summaries through simple text. Read More

One of the serious issues in communication between people is hiding information from others, and the best way for this, is deceiving them. Since nowadays face images are mostly used in three dimensional format, in this paper we are going to steganography 3D face images, detecting which by curious people will be impossible. As in detecting face only its texture is important, we separate texture from shape matrices, for eliminating half of the extra information, steganography is done only for face texture, and for reconstructing 3D face, we can use any other shape. Read More

A low-complexity 8-point orthogonal approximate DCT is introduced. The proposed transform requires no multiplications or bit-shift operations. The derived fast algorithm requires only 14 additions, less than any existing DCT approximation. Read More

In this study, a method to construct a full-colour volumetric display is presented using a commercially available inkjet printer. Photoreactive luminescence materials are minutely and automatically printed as the volume elements, and volumetric displays are constructed with high resolution using easy-to-fabricate means that exploit inkjet printing technologies. The results experimentally demonstrate the first prototype of an inkjet printing-based volumetric display composed of multiple layers of transparent films that yield a full-colour three-dimensional (3D) image. Read More

The paper presents some theoretical and practical considerations regarding the TV information distribution in local (small and medium) networks, using different technologies and architectures. The SMATV concept is chosen to be presented extensively. The most important design formulae are presented with a software package supporting the network planner to design and optimize the network. Read More

We demonstrate an adaptive bandwidth-efficient 360 VR video streaming system based on MPEG-DASH SRD. We extend MPEG-DASH SRD to the 3D space of 360 VR videos, and showcase a dynamic view-aware adaptation technique to tackle the high bandwidth demands of streaming 360 VR videos to wireless VR headsets. We spatially partition the underlying 3D mesh into multiple 3D sub-meshes, and construct an efficient 3D geometry mesh called hexaface sphere to optimally represent tiled 360 VR videos in the 3D space. Read More

Calibration and higher order statistics (HOS) are standard components of many image steganalysis systems. These techniques have not yet found adequate attention in audio steganalysis context. Specifically, most of current works are either non-calibrated or only based on noise removal approach. Read More