## A  信号模型

$$公式1：y(n)=x(n)+u(n)$$

$$公式2：w(n)=sin[\frac{\pi}{2}sin^2(\frac{\pi n}{N})]$$

$$公式3：Y(l,k)=X(l,k)+U(l,k)$$

## B  频带结构与特征表示

$$公式4：w_{m}(k)=\left\{\begin{array}{cc} 0, & k<f(m-1) \\ \frac{k-f(m-1)}{f(m)-f(m-1)}, & f(m-1) \leq k \leq f(m) \\ \frac{f(m+1)-k}{f(m+1)-f(m)}, & f(m)<k \leq f(m+1) \\ 0, & k>f(m+1) \end{array}\right.$$

$$公式5：f(m)=\left(\frac{N}{f_{s}}\right) F_{\text {mel }}^{-1}\left(F_{\text {mel }}\left(f_{1}\right)+m \frac{F_{\text {mel }}\left(f_{\mathrm{h}}\right)-F_{\text {mel }}\left(f_{1}\right)}{M+1}\right),$$

$$公式6：F_{mel}(f)=1125 log(1+\frac{f}{700})$$

$M=48$是频带数，音频处理在48kHz采样率，$f_h=24000$，$f_1=0$，$f_s=48000$，$F^{-1}_{mel}(f)$是$F_{mel}(f)$的逆函数。然后，将481维的特征压缩为48维。

$$公式7：w’_m(k)\frac{w_m(k)}{\sum_kw_m(k)},\ \sum_kw’_m(k)=1$$

$$公式8：E_{Y}(m)=\log \left(\max \left(\sum_{k} w_{m}^{\prime}(k)|Y(k)|^{2}, \alpha\right)\right)$$

$$公式9：S F(l)=10 \times \log _{10}\left(\frac{\exp \left(\frac{1}{K} \sum_{k} \ln (Y(l, k))\right)}{\frac{1}{K} \sum_{k} Y(l, k)}\right)$$

$$公式10：S F_{\text {smooth }}(l)=\gamma S F_{\text {smooth }}(l-1)+(1-\gamma) S F(l)$$

## C  特征归一化

$$公式11：\mu_{E(Y)}(l, m)=\lambda \mu_{E(Y)}(l-1, m)+(1-\lambda) E_{Y}(l, m)$$

$$公式12：\sigma_{E(M)}^{2}(l, m)=\lambda \sigma_{E(Y)}^{2}(l-1, m)+(1-\lambda) E_{Y}^{2}(l, m)$$

$$公式13：E_{Y}^{\prime}(l, m)=\frac{E_{Y}(l, m)-\mu_{E(Y)}(l, m)}{\sqrt{\sigma_{E(Y)}^{2}(l, m)-\mu_{E(Y)}^{2}(l, m)}}$$

## D  学习机器和训练设置

$$公式14：g_{birm}(m)=max(\sqrt{\frac{E_x(m)}{E_Y(m)}},10^{-3})$$

$$公式15：L=\varsigma\left(\sqrt{\widehat{g}_{b i m n}(m)}-\sqrt{g_{b i m}(m)}\right)^2+(1-\varsigma)\left(-10 \log _{10}(\frac{||\frac{\hat{x}^{T} x}{||x||^{2}} x||^{2}}{||\frac{\hat{x}^{T} x}{||x||^{2}} x-\hat{x}||^{2}})\right)$$

## E  RNN与改进的OMLSA相结合的策略

$$公式16：g=min(g_{omlsa},g_{min})$$

## 参考文献

[1] Y . Zhao, D. L. Wang, I. Merks, and T. Zhang, DNN-based enhancement of noisy and reverberant speech, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp.6525 6529.

[2] Wang, DeLiang, and Jitong Chen, “Supervised speech separation based on deep learning: An overview,” IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.10, 2018, pp. 1702-1726.

[3] Y. Xia, S. Braun, C. K. A. Reddy, H. Dubey, R. Cutler, and I. Tashev, Weighted speech distortion losses for neural-network-based real-time speech enhancement, in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 871 875.

[4] J. S. Bradley, H. Sato, and M. Picard, On the importance of early reflections for speech in rooms, Journal of the Acoustical Society of America, vol. 113, pp. 3233 3244, 2003.

[5] Y. Hu and P. Loizou, Evaluation of objective measures for speech enhancement, in Proc. Interspeech, 2006, pp. 1447 1450.

[6] J.-M. V alin, A hybrid DSP/deep learning approach to real-time fullband speech enhancement, in 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), 2018, pp. 1 5.

[7] J. Chen, Y. Wang, and D. Wang, A feature study for classificationbased speech separation at very low signal-to-noise ratio, in Proc. ICASSP, 2014, pp. 7059 7063.

[8] S. Davis aud P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” Acoustics, Speech and Signal Processing, iEEE Transactions on, vol. 28, no. 4, pp. 357-366, 1980.

[9] C. K. Reddy, E. Beyrami, J. Pool, R. Cutler, S. Srinivasan, and J. Gehrke, A Scalable Noisy Speech Dataset and Online Subjective Test Framework, in ISCA INTERSPEECH 2019, 2019, pp.1816 1820.

[10] Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, and John R. Hershey, SDR half-baked or well done? , in Acoustics, Speech and Signal Processing (ICASSP), 2019 IEEE International Conference on. IEEE, 2019, pp. 626 630.

[11] Isin Demirsahin, Oddur Kjartansson, Alexander Gutkin, and Clara Rivera, Open-source Multi-speaker Corpora of the English Accents in the British Isles, in Proceedings of The 12th Language Resources and Evaluation Conference (LREC), Marseille, France, May 2020, pp. 6532 6541, European Language Resources Association (ELRA).

[12] J. B. Allen and D. A. Berkley, Image method for efficiently simulating small-room acoustics, Journal of the Acoustical Society of America, vol. 65, pp. 943 950, 1979.

[13] Habets, Emanuel AP. “Room impulse response generator.” Technische Universiteit Eindhoven, Tech. Rep 2.2.4 (2006): 1.