## 5  参考文献

[1] D. Wang and J. Chen, Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, pp. 1702 1726, 2018.

[2] Y. Wang, A. Narayanan, and D. Wang, On training targets for supervised speech separation, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 22, no. 12, pp. 1849 1858, 2014.

[3] H. Erdogan, J. R. Hershey, S.Watanabe, and J. Le Roux, Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks, in ICASSP, 2015, pp. 708 712.

[4] D. S. Williamson, Y. Wang, and D. Wang, Complex ratio masking for monaural speech separation, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24, no. 3, pp. 483 492, 2016.

[5] D. Wang and J. Lim, The unimportance of phase in speech enhancement, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 30, no. 4, pp. 679 681, 1982.

[6] K. Paliwal, K. W ojcicki, and B. Shannon, The importance of phase in speech enhancement, speech communication, vol. 53, no. 4, pp. 465 494, 2011.

[7] S.-W. Fu, T.-y. Hu, Y. Tsao, and X. Lu, Complex spectrogram enhancement by convolutional neural network with multi-metrics learning, in International Workshop on Machine Learning for Signal Processing, 2017, pp. 1 6.

[8] K. Tan and D. Wang, Complex spectral mapping with a convolutional recurrent network for monaural speech enhancement, in ICASSP, 2019, pp. 6865 6869.

[9] H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, and K. Lee, Phase-aware speech enhancement with deep complex U-Net, arXiv preprint arXiv:1903.03107, 2019.

[10] A. Pandey and D. Wang, Exploring deep complex networks for complex spectrogram enhancement, in ICASSP, 2019, pp. 6885 6889.

[11] , A new framework for supervised speech enhancement in the time domain, in INTERSPEECH, 2018, pp. 1136 1140.

[12] S.-W. Fu, T.-W. Wang, Y. Tsao, X. Lu, and H. Kawai, End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 26, no. 9, pp. 1570 1584, 2018.

[13] A. Pandey and D. Wang, TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain, in ICASSP, 2019, pp. 6875 6879.

[14] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in IEEE conference on computer vision and pattern recognition, 2017, pp. 4700 4708.

[15] J. L. Ba, J. R. Kiros, and G. E. Hinton, Layer normalization, arXiv preprint arXiv:1607.06450, 2016.

[16] K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in IEEE International Conference on Computer Vision, 2015, pp. 1026 1034.

[17] W. Shi, J. Caballero, F. Husz ar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z.Wang, Real-time single image and video super-resolution using an efficient subpixel convolutional neural network, in IEEE conference on computer vision and pattern recognition, 2016, pp. 1874 1883.

[18] A. Odena, V. Dumoulin, and C. Olah, Deconvolution and checkerboard artifacts, Distill, 2016. [Online]. Available: http://distill.pub/2016/deconv-checkerboard

[19] A. Pandey and D. Wang, On adversarial training and loss functions for speech enhancement, in ICASSP, 2018, pp. 5414 5418.

[20] , A new framework for cnn-based speech enhancement in the time domain, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 27, no. 7, pp. 1179 1188, 2019.

[21] D. B. Paul and J. M. Baker, The design for the wall street journal-based CSR corpus, in Workshop on Speech and Natural Language, 1992, pp. 357 362.

[22] J. Chen, Y. Wang, S. E. Yoho, D. Wang, and E. W. Healy, Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises, The Journal of the Acoustical Society of America, vol. 139, no. 5, pp. 2604 2612, 2016.

[23] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An algorithm for intelligibility prediction of time frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125 2136, 2011.

[24] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs, in ICASSP, 2001, pp. 749 752.