Data deduplication is able to effectively identify and eliminate redundant
data and only maintain a single copy of files and chunks. Hence, it is widely
used in cloud storage systems to save storage space and network bandwidth.
However, the occurrence of deduplication can be easily identified by monitoring
and analyzing network traffic, which leads to the risk of user privacy leakage.
The attacker can carry out a very dangerous side channel attack, i.e.,
learn-the-remaining-information (LRI) attack, to reveal users' privacy
information by exploiting the side channel of network traffic in deduplication.
Existing work addresses the LRI attack at the cost of the high bandwidth
efficiency of deduplication. In order to address this problem, we propose a
simple yet effective scheme, called randomized redundant chunk scheme (RRCS),
to significantly mitigate the risk of the LRI attack while maintaining the high
bandwidth efficiency of deduplication. The basic idea behind RRCS is to add
randomized redundant chunks to mix up the real deduplication states of files
used for the LRI attack, which effectively obfuscates the view of the attacker,
who attempts to exploit the side channel of network traffic for the LRI attack.
Our security analysis shows that RRCS could significantly mitigate the risk of
the LRI attack. We implement the RRCS prototype and evaluate it by using three
large-scale real-world datasets. Experimental results demonstrate the
efficiency and efficacy of RRCS.