Computer Science - Multimedia Publications (50)


Computer Science - Multimedia Publications

3D steganalysis aims to identify subtle invisible changes produced in graphical objects through digital watermarking or steganography. Sets of statistical representations of 3D features, extracted from both cover and stego 3D mesh objects, are used as inputs into machine learning classifiers in order to decide whether any information was hidden in the given graphical object. According to previous studies, sets of local geometry features can be used to define the differences between stego and cover-objects. Read More

Limited annotated data is available for the research of estimating facial expression intensities, which makes the training of deep networks for automated expression assessment very challenging. Fortunately, fine-tuning from a data-extensive pre-trained domain such as face verification can alleviate the problem. In this paper, we propose a transferred network that fine-tunes a state-of-the-art face verification network using expression-intensity labeled data with a regression layer. Read More

Inspired by the recent advances of image super-resolution using convolutional neural network (CNN), we propose a CNN-based block up-sampling scheme for intra frame coding. A block can be down-sampled before being compressed by normal intra coding, and then up-sampled to its original resolution. Different from previous studies on down/up-sampling based coding, the up-sampling interpolation filters in our scheme have been designed by training CNN instead of hand-crafted. Read More

In this paper, we study a simplified affine motion model based coding framework to overcome the limitation of translational motion model and maintain low computational complexity. The proposed framework mainly has three key contributions. First, we propose to reduce the number of affine motion parameters from 6 to 4. Read More

This paper proposes a novel advanced motion model to handle the irregular motion for the cubic map projection of 360-degree video. Since the irregular motion is mainly caused by the projection from the sphere to the cube map, we first try to project the pixels in both the current picture and reference picture from unfolding cube back to the sphere. Then through utilizing the characteristic that most of the motions in the sphere are uniform, we can derive the relationship between the motion vectors of various pixels in the unfold cube. Read More

Researchers often summarize their work in the form of scientific posters. Posters provide a coherent and efficient way to convey core ideas expressed in scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. Read More

Feature extraction is a critical component of many applied data science workflows. In recent years, rapid advances in artificial intelligence and machine learning have led to an explosion of feature extraction tools and services that allow data scientists to cheaply and effectively annotate their data along a vast array of dimensions---ranging from detecting faces in images to analyzing the sentiment expressed in coherent text. Unfortunately, the proliferation of powerful feature extraction services has been mirrored by a corresponding expansion in the number of distinct interfaces to feature extraction services. Read More

What we eat is one of the most frequent and important health decisions we make in daily life, yet it remains notoriously difficult to capture and understand. Effective food journaling is thus a grand challenge in personal health informatics. In this paper we describe a system for food journaling called I Ate This, which is inspired by the Remote Food Photography Method (RFPM). Read More

Photos are becoming spontaneous, objective, and universal sources of information. This paper develops evolving situation recognition using photo streams coming from disparate sources combined with the advances of deep learning. Using visual concepts in photos together with space and time information, we formulate the situation detection into a semi-supervised learning framework and propose new graph-based models to solve the problem. Read More

Conventional compressive sensing (CS) attempts to acquire the most important part of a signal directly. In fact, CS avoids acquisition of existed \textit{statistical redundancies} of a signal. Since the sensitivity of the human eye is different for each frequency, in addition to statistical redundancies, there exist \textit{perceptual redundancies} in an image which human eye could not detect them. Read More

The German Broadcasting Archive (DRA) maintains the cultural heritage of radio and television broadcasts of the former German Democratic Republic (GDR). The uniqueness and importance of the video material stimulates a large scientific interest in the video content. In this paper, we present an automatic video analysis and retrieval system for searching in historical collections of GDR television recordings. Read More

We propose a method to generate multiple hypotheses for human 3D pose all of them consistent with the 2D detection of joints in a monocular RGB image. To generate these pose hypotheses we use a novel generative model defined in the space of anatomically plausible 3D poses satisfying the joint angle limits and limb length ratios. The proposed generative model is uniform in the space of anatomically valid poses and as a result, does not suffer from the dataset bias in existing motion capture datasets such as Human3. Read More

The discrete cosine transform (DCT) is the key step in many image and video coding standards. The 8-point DCT is an important special case, possessing several low-complexity approximations widely investigated. However, 16-point DCT transform has energy compaction advantages. Read More

The best summary of a long video differs among different people due to its highly subjective nature. Even for the same person, the best summary may change with time or mood. In this paper, we introduce the task of generating customized video summaries through simple text. Read More

One of the serious issues in communication between people is hiding information from others, and the best way for this, is deceiving them. Since nowadays face images are mostly used in three dimensional format, in this paper we are going to steganography 3D face images, detecting which by curious people will be impossible. As in detecting face only its texture is important, we separate texture from shape matrices, for eliminating half of the extra information, steganography is done only for face texture, and for reconstructing 3D face, we can use any other shape. Read More

A low-complexity 8-point orthogonal approximate DCT is introduced. The proposed transform requires no multiplications or bit-shift operations. The derived fast algorithm requires only 14 additions, less than any existing DCT approximation. Read More

In this study, a method to construct a full-colour volumetric display is presented using a commercially available inkjet printer. Photoreactive luminescence materials are minutely and automatically printed as the volume elements, and volumetric displays are constructed with high resolution using easy-to-fabricate means that exploit inkjet printing technologies. The results experimentally demonstrate the first prototype of an inkjet printing-based volumetric display composed of multiple layers of transparent films that yield a full-colour three-dimensional (3D) image. Read More

We demonstrate an adaptive bandwidth-efficient 360 VR video streaming system based on MPEG-DASH SRD. We extend MPEG-DASH SRD to the 3D space of 360 VR videos, and showcase a dynamic view-aware adaptation technique to tackle the high bandwidth demands of streaming 360 VR videos to wireless VR headsets. We spatially partition the underlying 3D mesh into multiple 3D sub-meshes, and construct an efficient 3D geometry mesh called hexaface sphere to optimally represent tiled 360 VR videos in the 3D space. Read More

Calibration and higher order statistics (HOS) are standard components of many image steganalysis systems. These techniques have not yet found adequate attention in audio steganalysis context. Specifically, most of current works are either non-calibrated or only based on noise removal approach. Read More

Recently merging signal processing techniques with information security services has found lots of attentions. Steganography and steganalysis are among these emerging trends. Like their counterparts in cryptology, steganography and steganalysis are in a constant battle- steganography methods try to hide the presence of covert messages in innocuous-looking data, whereas steganalysis methods try to reveal existence of such messages and to break steganography methods. Read More

The robustness and security of the biometric watermarking approach can be improved by using a multiple watermarking. This multiple watermarking proposed for improving security of biometric features and data. When the imposter tries to create the spoofed biometric feature, the invisible biometric watermark features can provide appropriate protection to multimedia data. Read More

This paper outlines the development and testing of a novel, feedback-enabled attention allocation aid (AAAD), which uses real-time physiological data to improve human performance in a realistic sequential visual search task. Indeed, by optimizing over search duration, the aid improves efficiency, while preserving decision accuracy, as the operator identifies and classifies targets within simulated aerial imagery. Specifically, using experimental eye-tracking data and measurements about target detectability across the human visual field, we develop functional models of detection accuracy as a function of search time, number of eye movements, scan path, and image clutter. Read More

To stretch a music piece to a given length is a common demand in people's daily lives, e.g., in audio-video synchronization and animation production. Read More

Currently successful methods for video description are based on encoder-decoder sentence generation using recur-rent neural networks (RNNs). Recent work has shown the advantage of integrating temporal and/or spatial attention mechanisms into these models, in which the decoder net-work predicts each word in the description by selectively giving more weight to encoded features from specific time frames (temporal attention) or to features from specific spatial regions (spatial attention). In this paper, we propose to expand the attention model to selectively attend not just to specific times or spatial regions, but to specific modalities of input such as image features, motion features, and audio features. Read More

With the evolution of HDTV and Ultra HDTV, the bandwidth requirement for IP-based TV content is rapidly increasing. Consumers demand uninterrupted service with a high Quality of Experience (QoE). Service providers are constantly trying to differentiate themselves by innovating new ways of distributing content more efficiently with lower cost and higher penetration. Read More

Light field cameras can capture the 3D information in a scene with a single shot. This special feature makes light field cameras very appealing for a variety of applications: from the popular post-capture refocus, to depth estimation and image-based rendering. However, light field cameras suffer by design from strong limitations in their spatial resolution, which should therefore be augmented by computational methods. Read More

A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. Read More

HTTP adaptive streaming (HAS) has become the universal technology for video streaming over the Internet. Many HAS system designs aim at sharing the network bandwidth in a rate-fair manner. However, rate fairness is in general not equivalent to quality fairness as different video sequences might have different characteristics and resource requirements. Read More

We propose a new deep network for audio event recognition, called AENet. In contrast to speech, sounds coming from audio events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of clear sub-word units that are present in speech. Read More

Copy-move forgery is the most popular and simplest image manipulation method. In this type of forgery, an area from the image copied, then after post processing such as rotation and scaling, placed on the destination. The goal of Copy-move forgery is to hide or duplicate one or more objects in the image. Read More

Lately, World Wide Web came up with an evolution in the niche of videoconference applications. Latest technologies give browsers a capacity to initiate real-time communications. WebRTC is one of the free and open source projects that aim at providing the users freedom to enjoy real-time communications, and it does so by following and redefining the standards. Read More

Conventional state-of-the-art image steganalysis approaches usually consist of a classifier trained with features provided by rich image models. As both features extraction and classification steps are perfectly embodied in the deep learning architecture called Convolutional Neural Network (CNN), different studies have tried to design a CNN-based steganalyzer. The network designed by Xu et al. Read More

We introduce a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises a number of simple multi-instrument musical pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece, we provide the musical score in MIDI format, the audio recordings of the individual tracks, the audio and video recording of the assembled mixture, and ground-truth annotation files including frame-level and note-level transcriptions. Read More

It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in the application of deep learning cnns to address image recognition and image processing tasks. Here, we present a powerful cnn tailored to the specific task of semantic image understanding to achieve higher visual quality in lossy compression. Read More

The recent rise of interest in Virtual Reality (VR) came with the availability of commodity commercial VR prod- ucts, such as the Head Mounted Displays (HMD) created by Oculus and other vendors. To accelerate the user adoption of VR headsets, content providers should focus on producing high quality immersive content for these devices. Similarly, multimedia streaming service providers should enable the means to stream 360 VR content on their platforms. Read More

HEVC includes a Coding Unit (CU) level luminance-based perceptual quantization technique known as AdaptiveQP. AdaptiveQP perceptually adjusts the Quantization Parameter (QP) at the CU level based on the spatial activity of raw input video data in a luma Coding Block (CB). In this paper, we propose a novel cross-color channel adaptive quantization scheme which perceptually adjusts the CU level QP according to the spatial activity of raw input video data in the constituent luma and chroma CBs; i. Read More

A depth image provides partial geometric information of a 3D scene, namely the shapes of physical objects as observed from a particular viewpoint. This information is important when synthesizing images of different virtual camera viewpoints via depth-image-based rendering (DIBR). It has been shown that depth images can be efficiently coded using contour-adaptive codecs that preserve edge sharpness, resulting in visually pleasing DIBR-synthesized images. Read More

This paper studies the joint learning of action recognition and temporal localization in long, untrimmed videos. We employ a multi-task learning framework that performs the three highly related steps of action proposal, action recognition, and action localization refinement in parallel instead of the standard sequential pipeline that performs the steps in order. We develop a novel temporal actionness regression module that estimates what proportion of a clip contains action. Read More

In this paper, we present a novel pseudo sequence based 2-D hierarchical reference structure for light-field image compression. In the proposed scheme, we first decompose the light-field image into multiple views and organize them into a 2-D coding structure according to the spatial coordinates of the corresponding microlens. Then we mainly develop three technologies to optimize the 2-D coding structure. Read More

Retrieval of live, user-broadcast video streams is an under-addressed and increasingly relevant challenge. The on-line nature of the problem requires temporal evaluation and the unforeseeable scope of potential queries motivates an approach which can accommodate arbitrary search queries. To account for the breadth of possible queries, we adopt a no-example approach to query retrieval, which uses a query's semantic relatedness to pre-trained concept classifiers. Read More

This article introduces a novel family of decentralised caching policies, applicable to wireless networks with finite storage at the edge-nodes (stations). These policies, that are based on the Least-Recently-Used replacement principle, are here referred to as spatial multi-LRU. They update cache inventories in a way that provides content diversity to users who are covered by, and thus have access to, more than one station. Read More

Two multiplierless pruned 8-point discrete cosine transform (DCT) approximation are presented. Both transforms present lower arithmetic complexity than state-of-the-art methods. The performance of such new methods was assessed in the image compression context. Read More

Many people enjoy classical symphonic music. Its diverse instrumentation makes for a rich listening experience. This diversity adds to the conductor's expressive freedom to shape the sound according to their imagination. Read More

The query-by-image video retrieval (QBIVR) task has been attracting considerable research attention recently. However, most existing methods represent a video by either aggregating or projecting all its frames into a single datum point, which may easily cause severe information loss. In this paper, we propose an efficient QBIVR framework to enable an effective and efficient video search with image query. Read More

Recently, multidimensional signal reconstruction using a low number of measurements is of great interest. Therefore, an effective sampling scheme which should acquire the most information of signal using a low number of measurements is required. In this paper, we study a novel cube-based method for sampling and reconstruction of multidimensional signals. Read More

This paper introduces ALYSIA: Automated LYrical SongwrIting Application. ALYSIA is based on a machine learning model using Random Forests, and we discuss its success at pitch and rhythm prediction. Next, we show how ALYSIA was used to create original pop songs that were subsequently recorded and produced. Read More

Due to its remarkable energy compaction properties, the discrete cosine transform (DCT) is employed in a multitude of compression standards, such as JPEG and H.265/HEVC. Several low-complexity integer approximations for the DCT have been proposed for both 1-D and 2-D signal analysis. Read More

In this paper, we propose a learning-based supervised discrete hashing method. Binary hashing is widely used for large-scale image retrieval as well as video and document searches because the compact representation of binary code is essential for data storage and reasonable for query searches using bit-operations. The recently proposed Supervised Discrete Hashing (SDH) efficiently solves mixed-integer programming problems by alternating optimization and the Discrete Cyclic Coordinate descent (DCC) method. Read More

With the growth of user-generated content, we observe the constant rise of the number of companies, such as search engines, content aggregators, etc., that operate with tremendous amounts of web content not being the services hosting it. Thus, aiming to locate the most important content and promote it to the users, they face the need of estimating the current and predicting the future content popularity. Read More

Steganography schemes are designed with the objective of minimizing a defined distortion function. In most existing state of the art approaches, this distortion function is based on image feature preservation. Since smooth regions or clean edges define image core, even a small modification in these areas largely modifies image features and is thus easily detectable. Read More