Research Article
-
L. Perez and J. Wang. "The effectiveness of data augmentation in image classification using deep learning", CoRR, pp. 1-8, 2017.
-
10.3390/electronics11223795O. Abayomi-Alli, R. Damaševičius, A. Qazi, M. Adedoyin-Olowe, and S. Misra, "Data augmentation and deep learning methods in sound classification: a systematic review", Electronics.. Basel : MDPI., Vol. 11. Issue 22, No. 3795, pp. 1-32, 2022.
-
10.3390/s23156972H. Chu, Y. Zhang, and H. Chiang, "A CNN Sound Classification Mechanism Using Data Augmentation", Sensors (Basel), Vol. 23, No. 15, 6972, 2023.
-
10.21437/Interspeech.2020-2059W. Han, Z. Zhang, Y. Zhang, J. Yu, C.C. Chiu, J Qin, A. Gulati, R. Pang, and Y. Wu, "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context", In Proceedings of the Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Shanghai, pp. 3610-3614, 2020.
-
10.1038/s41598-022-12260-yM. Orken, O. Dina, A. Keylan, T. Tolganay, and O. Mohamed, "A study of transformer-based end-to-end speech recognition system for Kazakh language", Sci. Rep, Vol. 12, 8337, 2022.
-
10.1155/2022/3387598Y. Zhang, "Music Recommendation System and Recommendation Model Based on Convolutional Neural Network", Mob. Inf. Syst. 3387598, 2022.
-
K. Huang, H. Qin, X. Zhang, and H. Zhang, "Music recommender system based on graph convolutional neural networks with attention mechanism", Neural Netw, Vol. 135, pp. 107-117, 2021.
-
10.1016/j.apacoust.2019.107041G. Marc, M. Damian, "Environmental sound monitoring using machine learning on mobile devices", Appl. Acoust, Vol. 159, 107041, 2020.
-
10.3390/s22228608A. Nogueira, H. Oliveira, J. Machado, and J. Tavares, "Sound Classification and Processing of Urban Environments: A Systematic Literature Review", Sensors, Vol. 22, 8608, 2022.
-
10.23919/EUSIPCO55093.2022.9909901T. Nishida, K. Dohi, T. Endo, M. Yamamoto, and Y. Kawaguchi, "Anomalous Sound Detection Based on Machine Activity Detection", In Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), pp. 269-273, 2022.
-
10.3390/app112311128Y. Wang, Y. Zheng, Y. Zhang, Y. Xie, S. Xu, Y. Hu, and L. He, "Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Using Classification-Based Methods", Appl. Sci. Vol. 11, 11128, 2021.
-
10.1145/2871183M. Crocco, M. Cristani, and A. Trucco, "Murino, V. Audio surveillance: A systematic review", ACM Comput. Surv. Vol. 48, pp. 1-46, 2016.
-
10.1016/j.knosys.2020.105600Y. Leng, W. Zhao, C. Lin, C. Sun, R. Wang, Q. Yuan, and D. Li, "LDA-based data augmentation algorithm for acoustic scene classification", Knowl.-Based Syst. Vol. 195, 105600, 2020.
-
10.3389/fcomp.2020.00014M. Lech, M. Stolar, C. Best, and R. Bolia, "Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding", Frontiers in Computer Science, Vol. 2, No. 14, pp. 1-14, 2020.
-
10.1109/TMM.2017.2751969N. Takahashi, M. Gygli, and L. Van Gool, "AENet: Learning deep audio features for video analysis", IEEE Trans. Vol. 20, pp. 513-524, 2018.
-
L. Haohe, C. Zehua, Y. Yi, M. Xinhao, L. Xubo, M. Danilo, W. Wenwu, and D. P. Mark, "AudioLDM: Text-to-Audio Generation with Latent Diffusion Models", International Conference on Machine Learning, pp. 1-25, 2023.
-
10.1016/j.eswa.2021.116386H. Alonso, M. Barragán Pulido, J. Gil Bordón, M. Ferrer Ballester, C. Travieso González, "Speech evaluation of patients with alzheimer's disease using an automatic interviewer", Expert Syst. Vol. 192, 6386, 2022.
-
10.3390/app11104544Y. Jeong, J. Kim, D. Kim, J. Kim, and K. Lee, "Methods for improving deep learning-based cardiac auscultation accuracy: Data augmentation and data generalization", Appl. Sci. Vol. 11, 4544, 2021.
-
10.1142/S0218001409007326Y. Sun, A. Wong, and M. Kamel, "Classification of Imbalanced Data: A Review", Int. J. Pattern Recognit. Vol. 23, pp. 687-719, 2009.
-
10.20906/CBA2022/3469L. Ferreira-Paiva, E. Alfaro-Espinoza, V. Almeida, L. Felix, and R. Neves, "A Survey of Data Augmentation for Audio Classification", Sociedade Brasileira de Automática (SBA), Vol. 3, No. 1, pp. 2165-2172, 2022.
-
10.1109/JSTSP.2019.2908700H. Purwins, B. Li, T. Virtanen, J. Schlüter, S. Y. Chang, and T. Sainath, "Deep Learning for Audio Signal Processing", in IEEE Journal of Selected Topics in Signal Processing, Vol. 13, No. 2, pp. 206-219, 2019.
-
10.21437/Interspeech.2019-2680D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition", Proc. Interspeech 2019, pp. 2613-2617, 2019.
-
10.21437/Interspeech.2022-572A. Jain, P.R. Samala, D. Mittal, P. Jyothi, and M. Singh, "SPLICEOUT: A Simple and Efficient Audio Augmentation Method", Processing Interspeech, pp. 2678-2682, 2022.
-
10.1109/ICCV.2019.00612S. Yun, D. Han, S. Chun, S. Oh, Y. Yoo, and J. Choe, "Cutmix: Regularization strategy to train strong classifiers with localizable features", In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6022-6031, 2019.
-
G. Kim, D. K. Han, and H. Ko, "SpecMix: Data Augmentation for Speech Recognition", Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 1, pp. 6-10, 2021.
-
H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, "Mixup: Beyond empirical risk minimization", In International Conference on Learning Representations(ICLR), pp. 1-13, 2018.
-
C. Donahue, J. McAuley, and M. Puckette, "Adversarial Audio Synthesis", International Conference on Learning Representations (ICLR), pp. 1-15, 2018.
-
Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial networks", in Advances in Neural Information Processing Systems(NIPS), pp. 2672-2680, 2014.
-
K. Kumar, R. Kumar, T. de Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. de Brebisson, Y. Bengio, and A. Courville, "MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis", Advances in Neural Information Processing Systems (NeurIPS), 14910-14921, 2019.
-
10.1109/ICASSP.2018.8462342H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, "CycleGAN-VC: Non-parallel Voice Conversion using Cycle-Consistent Adversarial Networks," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5279-5283, 2018.
-
R. Lengyel, E. Moliner, M. Zwicker and T. Gerkmann, "StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6034-6038, 2021.
-
10.23919/EUSIPCO58844.2023.10290027Carlo Aironi, Samuele Cornell, Luca Serafini, Stefano Squartini, "A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment", European Signal Processing Conference, pp.1-5, 2023.
-
10.1109/ICASSP.2019.8683143R. Prenger, R. Valle, and B. Catanzaro, "WaveGlow: A Flow-based Generative Network for Speech Synthesis", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3617-3621, 2019.
-
10.1109/ICASSP40776.2020.9053795R. Yamamoto, E. Song, and J. Kim, "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6199-6203, 2020.
-
J. Kong, J. Kim, and J. Bae, "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis", Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 1-14, 2020.
-
J. Engel, K. K. Agrawal, S. Chen, I. Gulrajani, C. Donahue, and A. Roberts, "GANSynth: Adversarial Neural Audio Synthesis", International Conference on Learning Representations (ICLR), pp. 1-17, 2019.
-
M. Arjovsky, S. Chintala, and L. Bottou, "Wasserstein GAN", Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 214-223, 2017.
-
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, "Improved training of wasserstein GANs", In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), pp. 5769 - 5779, 2017.
-
10.1109/TPAMI.2020.2970919T. Karras, S. Laine, and T. Aila, "A Style-Based Generator Architecture for Generative Adversarial Networks", in IEEE Transactions on Pattern Analysis & Machine Intelligence, Vol. 43, No. 12, pp. 4217-4228, 2021.
-
10.1109/CVPR.2017.632P. Isola, J. Zhu, T. Zhou, and A. A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967-5976, 2017.
-
M. Mirza and O. Simon, "Conditional Generative Adversarial Nets", arXiv preprint arXiv, pp. 1-7, 2014.
-
D. P. Kingma and P. Dhariwal, "Glow: Generative flow with invertible 1x1 convolutions", Advances in Neural Information Processing Systems (NeurIPS 2018), pp. 1-10, 2018.
-
A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: A generative model for raw audio", Speech Synthesis Workshop, pp. 1-15, 2016.
-
S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo, A. Courville, and Y. Bengio, "SampleRNN: An unconditional end-to-end neural audio generation model", ICLR 2017.
-
C. Donahue, J. McAuley, and M. Puckette, "Adversarial Audio Synthesis", International Conference on Learning Representations.
- Publisher :The Society of Convergence Knowledge
- Publisher(Ko) :융복합지식학회
- Journal Title :The Society of Convergence Knowledge Transactions
- Journal Title(Ko) :융복합지식학회논문지
- Volume : 12
- No :3
- Pages :81-103
- DOI :https://doi.org/10.22716/sckt.2024.12.3.007


The Society of Convergence Knowledge Transactions






