Computer Science - Multimedia Publications (50)


Computer Science - Multimedia Publications

The robustness and security of the biometric watermarking approach can be improved by using a multiple watermarking. This multiple watermarking proposed for improving security of biometric features and data. When the imposter tries to create the spoofed biometric feature, the invisible biometric watermark features can provide appropriate protection to multimedia data. Read More

This paper outlines the development and testing of a novel, feedback-enabled attention allocation aid (AAAD), which uses real-time physiological data to improve human performance in a realistic sequential visual search task. Indeed, by optimizing over search duration, the aid improves efficiency, while preserving decision accuracy, as the operator identifies and classifies targets within simulated aerial imagery. Specifically, using experimental eye-tracking data and measurements about target detectability across the human visual field, we develop functional models of detection accuracy as a function of search time, number of eye movements, scan path, and image clutter. Read More

To stretch a music piece to a given length is a common demand in people's daily lives, e.g., in audio-video synchronization and animation production. Read More

Currently successful methods for video description are based on encoder-decoder sentence generation using recur-rent neural networks (RNNs). Recent work has shown the advantage of integrating temporal and/or spatial attention mechanisms into these models, in which the decoder net-work predicts each word in the description by selectively giving more weight to encoded features from specific time frames (temporal attention) or to features from specific spatial regions (spatial attention). In this paper, we propose to expand the attention model to selectively attend not just to specific times or spatial regions, but to specific modalities of input such as image features, motion features, and audio features. Read More

With the evolution of HDTV and Ultra HDTV, the bandwidth requirement for IP-based TV content is rapidly increasing. Consumers demand uninterrupted service with a high Quality of Experience (QoE). Service providers are constantly trying to differentiate themselves by innovating new ways of distributing content more efficiently with lower cost and higher penetration. Read More

Light field cameras can capture the 3D information in a scene with a single shot. This special feature makes light field cameras very appealing for a variety of applications: from the popular post-capture refocus, to depth estimation and image-based rendering. However, light field cameras suffer by design from strong limitations in their spatial resolution, which should therefore be augmented by computational methods. Read More

A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. Read More

HTTP adaptive streaming (HAS) has become the universal technology for video streaming over the Internet. Many HAS system designs aim at sharing the network bandwidth in a rate-fair manner. However, rate fairness is in general not equivalent to quality fairness as different video sequences might have different characteristics and resource requirements. Read More

We propose a new deep network for audio event recognition, called AENet. In contrast to speech, sounds coming from audio events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of clear sub-word units that are present in speech. Read More

Copy-move forgery is the most popular and simplest image manipulation method. In this type of forgery, an area from the image copied, then after post processing such as rotation and scaling, placed on the destination. The goal of Copy-move forgery is to hide or duplicate one or more objects in the image. Read More

Conventional state-of-the-art image steganalysis approaches usually consist of a classifier trained with features provided by rich image models. As both features extraction and classification steps are perfectly embodied in the deep learning architecture called Convolutional Neural Network (CNN), different studies have tried to design a CNN-based steganalyzer. The network designed by Xu et al. Read More

We introduce a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises a number of simple multi-instrument musical pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece, we provide the musical score in MIDI format, the audio recordings of the individual tracks, the audio and video recording of the assembled mixture, and ground-truth annotation files including frame-level and note-level transcriptions. Read More

It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in the application of deep learning cnns to address image recognition and image processing tasks. Here, we present a powerful cnn tailored to the specific task of semantic image understanding to achieve higher visual quality in lossy compression. Read More

The recent rise of interest in Virtual Reality (VR) came with the availability of commodity commercial VR prod- ucts, such as the Head Mounted Displays (HMD) created by Oculus and other vendors. To accelerate the user adoption of VR headsets, content providers should focus on producing high quality immersive content for these devices. Similarly, multimedia streaming service providers should enable the means to stream 360 VR content on their platforms. Read More

HEVC includes a Coding Unit (CU) level luminance-based perceptual quantization technique known as AdaptiveQP. AdaptiveQP perceptually adjusts the Quantization Parameter (QP) at the CU level based on the spatial activity of raw input video data in a luma Coding Block (CB). In this paper, we propose a novel cross-color channel adaptive quantization scheme which perceptually adjusts the CU level QP according to the spatial activity of raw input video data in the constituent luma and chroma CBs; i. Read More

A depth image provides partial geometric information of a 3D scene, namely the shapes of physical objects as observed from a particular viewpoint. This information is important when synthesizing images of different virtual camera viewpoints via depth-image-based rendering (DIBR). It has been shown that depth images can be efficiently coded using contour-adaptive codecs that preserve edge sharpness, resulting in visually pleasing DIBR-synthesized images. Read More

This paper studies the joint learning of action recognition and temporal localization in long, untrimmed videos. We employ a multi-task learning framework that performs the three highly related steps of action proposal, action recognition, and action localization refinement in parallel instead of the standard sequential pipeline that performs the steps in order. We develop a novel temporal actionness regression module that estimates what proportion of a clip contains action. Read More

In this paper, we present a novel pseudo sequence based 2-D hierarchical reference structure for light-field image compression. In the proposed scheme, we first decompose the light-field image into multiple views and organize them into a 2-D coding structure according to the spatial coordinates of the corresponding microlens. Then we mainly develop three technologies to optimize the 2-D coding structure. Read More

Retrieval of live, user-broadcast video streams is an under-addressed and increasingly relevant challenge. The on-line nature of the problem requires temporal evaluation and the unforeseeable scope of potential queries motivates an approach which can accommodate arbitrary search queries. To account for the breadth of possible queries, we adopt a no-example approach to query retrieval, which uses a query's semantic relatedness to pre-trained concept classifiers. Read More

This article introduces a novel family of decentralised caching policies, applicable to wireless networks with finite storage at the edge-nodes (stations). These policies, that are based on the Least-Recently-Used replacement principle, are here referred to as spatial multi-LRU. They update cache inventories in a way that provides content diversity to users who are covered by, and thus have access to, more than one station. Read More

Two multiplierless pruned 8-point discrete cosine transform (DCT) approximation are presented. Both transforms present lower arithmetic complexity than state-of-the-art methods. The performance of such new methods was assessed in the image compression context. Read More

Many people enjoy classical symphonic music. Its diverse instrumentation makes for a rich listening experience. This diversity adds to the conductor's expressive freedom to shape the sound according to their imagination. Read More

The query-by-image video retrieval (QBIVR) task has been attracting considerable research attention recently. However, most existing methods represent a video by either aggregating or projecting all its frames into a single datum point, which may easily cause severe information loss. In this paper, we propose an efficient QBIVR framework to enable an effective and efficient video search with image query. Read More

Recently, multidimensional signal reconstruction using a low number of measurements is of great interest. Therefore, an effective sampling scheme which should acquire the most information of signal using a low number of measurements is required. In this paper, we study a novel cube-based method for sampling and reconstruction of multidimensional signals. Read More

This paper introduces ALYSIA: Automated LYrical SongwrIting Application. ALYSIA is based on a machine learning model using Random Forests, and we discuss its success at pitch and rhythm prediction. Next, we show how ALYSIA was used to create original pop songs that were subsequently recorded and produced. Read More

Due to its remarkable energy compaction properties, the discrete cosine transform (DCT) is employed in a multitude of compression standards, such as JPEG and H.265/HEVC. Several low-complexity integer approximations for the DCT have been proposed for both 1-D and 2-D signal analysis. Read More

In this paper, we propose a learning-based supervised discrete hashing method. Binary hashing is widely used for large-scale image retrieval as well as video and document searches because the compact representation of binary code is essential for data storage and reasonable for query searches using bit-operations. The recently proposed Supervised Discrete Hashing (SDH) efficiently solves mixed-integer programming problems by alternating optimization and the Discrete Cyclic Coordinate descent (DCC) method. Read More

With the growth of user-generated content, we observe the constant rise of the number of companies, such as search engines, content aggregators, etc., that operate with tremendous amounts of web content not being the services hosting it. Thus, aiming to locate the most important content and promote it to the users, they face the need of estimating the current and predicting the future content popularity. Read More

Steganography schemes are designed with the objective of minimizing a defined distortion function. In most existing state of the art approaches, this distortion function is based on image feature preservation. Since smooth regions or clean edges define image core, even a small modification in these areas largely modifies image features and is thus easily detectable. Read More

Music holds a significant cultural role in social identity and in the encouragement of socialization. Technology, by the destruction of physical and cultural distance, has lead to many changes in musical themes and the complete loss of forms. Yet, it also allows for the preservation and distribution of music from societies without a history of written sheet music. Read More

Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Read More

Labelled image datasets have played a critical role in high-level image understanding; however the process of manual labelling is both time-consuming and labor intensive. To reduce the cost of manual labelling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the "dataset bias problem". Read More

Numerous fake images spread on social media today and can severely jeopardize the credibility of online content to public. In this paper, we employ deep networks to learn distinct fake image related features. In contrast to authentic images, fake images tend to be eye-catching and visually striking. Read More

The Multilingual Visual Sentiment Ontology (MVSO) consists of 15,600 concepts in 12 different languages that are strongly related to emotions and sentiments expressed in images. These concepts are defined in the form of Adjective-Noun Pair (ANP), which are crawled and discovered from online image forum Flickr. In this work, we used Amazon Mechanical Turk as a crowd-sourcing platform to collect human judgments on sentiments expressed in images that are uniformly sampled over 3,911 English ANPs extracted from a tag-restricted subset of MVSO. Read More

We propose a scalable approach to learn video-based question answering (QA): answer a "free-form natural language question" about a video content. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Read More

Subjective questions such as `does neymar dive', or `is clinton lying', or `is trump a fascist', are popular queries to web search engines, as can be seen by autocompletion suggestions on Google, Yahoo and Bing. In the era of cognitive computing, beyond search, they could be handled as hypotheses issued for evaluation. Our vision is to leverage on unstructured data and metadata of the rich user-generated multimedia that is often shared as material evidence in favor or against hypotheses in social media platforms. Read More

Deep learning frameworks have achieved overwhelming superiority in many fields of pattern recognition in recent years. However, the application of deep learning frameworks in image steganalysis is still in its initial stage. In this paper we firstly proved that the convolution phase and the quantization & truncation phase are not learnable for deep neural networks. Read More

Simple quality metrics such as PSNR are known to not correlate well with subjective quality when tested across a wide spectrum of video content or quality regime. Recently, efforts have been made in designing objective quality metrics trained on subjective data, demonstrating better correlation with video quality perceived by human. Clearly, the accuracy of such a metric heavily depends on the quality of the subjective data that it is trained on. Read More

Phase Shift Keying on the Hypersphere (PSKH), a generalization of conventional Phase Shift Keying (PSK) for Multiple-Input Multiple-Output (MIMO) systems, is introduced. In PSKH, constellation points are distributed on a multidimensional hypersphere. The use of such constellations with a Peak-To-Average-Sum-Power-Ratio (PASPR) of 1 allows to use load-modulated transmitters which can cope with a small backoff, which in turn results in a high power efficiency. Read More

In IEEE 802.11, the retry limit is set the same value for all packets. In this paper, we dynamically classify video teleconferencing packets based on the type of the video frame that a packet carries and the packet loss events that have happened in the network, and assign them different retry limits. Read More

In image compression, classical block-based separable transforms tend to be inefficient when image blocks contain arbitrarily shaped discontinuities. For this reason, transforms incorporating directional information are an appealing alternative. In this paper, we propose a new approach to this problem, namely a discrete cosine transform (DCT) that can be steered in any chosen direction. Read More

People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. Read More

With the fast growth of communication networks, the video data transmission from these networks is extremely vulnerable. Error concealment is a technique to estimate the damaged data by employing the correctly received data at the decoder. In this paper, an efficient boundary matching algorithm for estimating damaged motion vectors (MVs) is proposed. Read More

An important concept in wireless systems has been quality of experience (QoE)-aware video transmission. Such communications are considered not only connection-based communications but also content-aware communications, since the video quality is closely related to the content itself. It becomes necessary therefore for video communications to utilize a cross-layer design (also known as joint source and channel coding). Read More

Sending compressed video data in error-prone environments (like the Internet and wireless networks) might cause data degradation. Error concealment techniques try to conceal the received data in the decoder side. In this paper, an adaptive boundary matching algorithm is presented for recovering the damaged motion vectors (MVs). Read More

Image Forensics has already achieved great results for the source camera identification task on images. Standard approaches for data coming from Social Network Platforms cannot be applied due to different processes involved (e.g. Read More

Steganography is the art of hiding data, in such a way that it is undetectable under traffic-pattern analysis and the data hidden is only known to the receiver and the sender. In this paper new method of text steganography over the silence interval of audio in a video file, is presented. In the proposed method first the audio signal is extracted from the video. Read More

This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. Read More

The latest High Efficiency Video Coding (HEVC) standard significantly improves coding efficiency over its previous video coding standards. The expense of such improvement is enormous computational complexity, from both encoding and decoding perspectives. Since the capability and capacity of power are diverse across portable devices, it is necessary to reduce decoding complexity to a target with tolerable quality loss, so called complexity control. Read More