E. Parliament and . Council, Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec, General Data Protection Regulation, 2016.

A. Nautsch, C. Jasserand, E. Kindt, M. Todisco, I. Trancoso et al., The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Understanding, Proc. Interspeech, pp.3695-3699, 2019.

N. Tomashenko, B. M. Srivastava, X. Wang, E. Vincent, A. Nautsch et al., Introducing the VoicePrivacy Initiative, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02562199

F. Fang, X. Wang, J. Yamagishi, I. Echizen, M. Todisco et al., Speaker Anonymization Using X-vector and Neural Waveform Models, Proc. 10th ISCA Speech Synthesis Workshop, pp.155-160, 2019.

C. Magariños, P. Lopez-otero, L. Docio-fernandez, E. Rodriguez-banga, D. Erro et al., Reversible speaker de-identification using pre-trained transformation functions, Computer Speech & Language, vol.46, pp.36-52, 2017.

S. H. Mohammadi and A. Kain, An overview of voice conversion systems, Speech Communication, vol.88, pp.65-82, 2017.

Z. Wu, T. Virtanen, E. S. Chng, S. Member, and H. Li, Exemplarbased sparse representation with residual compensation for voice conversion, Speech and Language Processing, pp.1506-1521, 2014.

L. Sun, K. Li, H. Wang, S. Kang, and H. Meng, Phonetic posteriorgrams for many-to-one voice conversion without parallel data training, IEEE International Conference on Multimedia and Expo (ICME), pp.1-6, 2016.

Y. Saito, Y. Ijima, K. Nishida, and S. Takamichi, Nonparallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5274-5278, 2018.

Y. Adi, N. Zeghidour, R. Collobert, N. Usunier, V. Liptchinsky et al., To reverse the gradient or not: an empirical comparison of adversarial and multi-task learning in speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3742-3746, 2019.

T. Tsuchiya, N. Tawara, T. Ogawa, and T. Kobayashi, Speaker invariant feature extraction for zero-resource languages with adversarial learning, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2381-2385, 2018.

C. Feutry, P. Piantanida, Y. Bengio, and P. Duhamel, Learning anonymized representations with adversarial neural networks, ArXiv, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01742447

B. M. Srivastava, A. Bellet, M. Tommasi, and E. Vincent, Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion, Proc. Interspeech, pp.3700-3704, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02166434

T. Ryffel, D. Pointcheval, F. Bach, E. Dufour-sans, and R. Gay, Partially encrypted deep learning using functional encryption, Advances in Neural Information Processing Systems, vol.32, pp.4517-4528, 2019.

S. Mcadams, Spectral fusion, spectral parsing and the formation of the auditory image, 1984.

X. Wang and J. Yamagishi, Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Textto-Speech Synthesis, Proc. 10th ISCA Speech Synthesis Workshop, pp.1-6, 2019.

V. Peddinti, D. Povey, and S. Khudanpur, A time delay neural network architecture for efficient modeling of long temporal contexts, Proc. Interspeech, pp.3214-3218, 2015.

D. Povey, G. Cheng, Y. Wang, K. Li, H. Xu et al., Semi-orthogonal low-rank matrix factorization for deep neural networks, Proc. Interspeech, pp.3743-3747, 2018.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The kaldi speech recognition toolkit, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, 2011.

R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units, Association for Computational Linguistics, pp.1715-1725, 2016.

T. Kudo, Subword regularization: Improving neural network translation models with multiple subword candidates, Association for Computational Linguistics, pp.66-75, 2018.

C. Qin, D. Qu, and L. Zhang, Towards end-to-end speech recognition with transfer learning, EURASIP Journal on Audio, Speech, and Music Processing, vol.2018, 2018.

L. Samarakoon, B. Mak, and A. Y. Lam, Domain adaptation of end-to-end speech recognition in low-resource settings, IEEE Spoken Language Technology Workshop (SLT), pp.382-388, 2018.

S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba et al., Espnet: End-to-end speech processing toolkit, Proc. Interspeech, pp.2207-2211, 2018.

S. Watanabe, T. Hori, S. Kim, J. R. Hershey, and T. Hayashi, Hybrid ctc/attention architecture for end-to-end speech recognition, IEEE Journal of Selected Topics in Signal Processing, vol.11, issue.8, pp.1240-1253, 2017.

D. Snyder, D. Garcia-romero, G. Sell, D. Povey, and S. Khudanpur, X-vectors: Robust dnn embeddings for speaker recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 2018.

V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, Librispeech: An asr corpus based on public domain audio books, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5206-5210, 2015.

J. Sas and A. Sas, Gender recognition using neural networks and asr techniques, Journal of MIT, vol.22, pp.179-187, 2013.

C. Gussenhoven, Pitch in Language I: Stress and Intonation, ser. Research Surveys in Linguistics, pp.12-25, 2004.