In typical neural machine translation~(NMT), the decoder generates a sentence
word by word, packing all linguistic granularities in the same time-scale of
RNN. In this paper, we propose a new type of decoder for NMT, which splits the
decode state into two parts and updates them in two different time-scales.
Specifically, we first predict a chunk time-scale state for phrasal modeling,
on top of which multiple word time-scale states are generated. Read More