ScholarWorks Collection:

ScholarWorks Collection: https://scholar.korea.ac.kr/handle/2021.sw.korea/819 Wed, 08 Apr 2026 12:21:17 GMT 2026-04-08T12:21:17Z Semantically complex audio to video generation with audio source separation https://scholar.korea.ac.kr/handle/2021.sw.korea/268501 Title: Semantically complex audio to video generation with audio source separation Authors: Kim, Sieun; Jeong, Jaehwan; In, Sumin; Lee, Seung Hyun; Kim, Seungryong; Kim, Saerom; Baek, Wooyeol; Yoon, Sang Ho; Culurciello, Eugenio; Kim, Sangpil Abstract: Recent advancements in artificial intelligence for audio-to-video generation have shown the ability to generate high-quality videos from audio, particularly by focusing on temporal semantics and magnitude. However, existing works struggle to capture all semantics from audio, as real world audios often consist of mixed sources, making it challenging to generate semantically aligned videos. To solve this problem, we present a novel multi- source audio-to-video generation framework that incorporates decomposed multiple audio sources into video generative models. Specifically, our proposed Attention Mosaic directly maps each decomposed audio feature to the corresponding spatial attention feature. In addition, our condition injection module is helpful for producing more natural contexts with non-audible objects by leveraging the knowledge of existing generative models. Our experiments show that the proposed framework achieves state-of-the-art performance in representing both multi- and single-source audio-to-video generation methods. Sun, 01 Jun 2025 00:00:00 GMT https://scholar.korea.ac.kr/handle/2021.sw.korea/268501 2025-06-01T00:00:00Z IdenBAT: Disentangled representation learning for identity-preserved brain age transformation https://scholar.korea.ac.kr/handle/2021.sw.korea/269259 Title: IdenBAT: Disentangled representation learning for identity-preserved brain age transformation Authors: Maeng, Junyeong; Oh, Kwanseok; Jung, Wonsik; Suk, Heung-Il Abstract: Brain age transformation aims to convert reference brain images into synthesized images that accurately reflect the age-specific features of a target age group. The primary objective of this task is to modify only the age-related attributes of the reference image while preserving all other age-irrelevant attributes. However, achieving this goal poses substantial challenges due to the inherent entanglement of various image attributes within features extracted from a backbone encoder, resulting in simultaneous alterations during image generation. To address this challenge, we propose a novel architecture that employs disentangled representation learning for identity-preserved brain age transformation, called IdenBAT. This approach facilitates the decomposition of image features, ensuring the preservation of individual traits while selectively transforming age-related characteristics to match those of the target age group. Through comprehensive experiments conducted on both 2D and full-size 3D brain datasets, our method adeptly converts input images to target age while retaining individual characteristics accurately. Furthermore, our approach demonstrates superiority over existing state-of-the-art regarding performance fidelity. The code is available at: https://github.com/kumilab/IdenBAT. Sun, 01 Jun 2025 00:00:00 GMT https://scholar.korea.ac.kr/handle/2021.sw.korea/269259 2025-06-01T00:00:00Z Towards symbolic XAI - explanation through human understandable logical relationships between features https://scholar.korea.ac.kr/handle/2021.sw.korea/268191 Title: Towards symbolic XAI - explanation through human understandable logical relationships between features Authors: Schnake, Thomas; Jafari, Farnoush Rezaei; Lederer, Jonas; Xiong, Ping; Nakajima, Shinichi; Gugler, Stefan; Montavon, Gregoire; Mueller, Klaus-Robert Abstract: Explainable Artificial Intelligence (XAI) plays a crucial role in fostering transparency and trust in AI systems. Traditional XAI methods typically provide a single level of abstraction for explanations, often in the form of heatmaps in post-hoc attribution methods. Alternatively, XAI offers rule-based explanations that are expressive and composed of logical formulas but often fail to faithfully capture the model's decision-making processor impose strict limitations on the model's learning capabilities by requiring it to be inherently self-explainable. We aim to bridge these two approaches by developing post-hoc explanations that attribute relevance to complex logical relationships between input features while faithfully aligning with the model's intricate prediction processes and imposing no restrictions on the model's architecture. To this end, we propose a framework called Symbolic XAI, which attributes relevance to symbolic formulas expressing logical relationships between input features. Our method naturally extends propagation-based explanation approaches, such as layer-wise relevance propagation or GNN-LRP, and perturbation-based approaches, such as Shapley values. Beyond relevance attribution of logical formulas fora model's prediction, our framework introduces a strategy to automatically identify logical formulas that best summarize the model's decision strategy, eliminating the need to predefine these formulas. We demonstrate the effectiveness of our framework in domains such as natural language processing (NLP), computer vision, and chemistry, where abstract symbolic domain knowledge is abundant and critically valuable to users. In summary, the Symbolic XAI framework provides a local understanding of the model's decision-making process that is both flexible for customization by the user and human-readable through logical formulas. Sun, 01 Jun 2025 00:00:00 GMT https://scholar.korea.ac.kr/handle/2021.sw.korea/268191 2025-06-01T00:00:00Z High-quality three-dimensional cartoon avatar reconstruction with Gaussian splatting https://scholar.korea.ac.kr/handle/2021.sw.korea/267256 Title: High-quality three-dimensional cartoon avatar reconstruction with Gaussian splatting Authors: Jang, Minhyuk; Kim, Jong Wook; Jang, Youngdong; Kim, Donghyun; Roh, Wonseok; Hwang, Inyong; Lin, Guang; Kim, Sangpil Abstract: The growth of the augmented reality industry has increased demand for three-dimensional (3D) cartoon avatars, requiring expertise from computer graphics designers. Recent 3D Gaussian splatting methods have successfully reconstructed 3D avatars from videos, establishing them as a promising solution for this task. However, these methods primarily focus on real-world videos, limiting their effectiveness in the cartoon domain. In this paper, we present an artificial intelligence (AI)-based method for 3D avatar reconstruction from animated cartoon videos, addressing the physically unrealistic and unstructured geometries of cartoons, as well as the varying texture styles across frames. Our surface fitting module models the unstructured geometry of cartoon characters by integrating the surfaces observed from multiple views into a 3D avatar. We design a style normalizer that adjusts color distributions to reduce texture color inconsistencies in each frame of animated cartoons. Additionally, to better capture the simplified color distributions of cartoons, we design a frequency transform loss that focuses on low-frequency components. Our method significantly outperforms state-of-the-art methods, achieving approximately a 25% improvement in Learned Perceptual Image Patch Similarity (LPIPS) with a score of 0.052 over baselines across the Cartoon Neuman and ToonVid datasets, which comprise 10 videos with diverse styles and poses. Consequently, this paper presents a promising solution to meet the growing demand for high-quality 3D cartoon avatar modeling. Thu, 15 May 2025 00:00:00 GMT https://scholar.korea.ac.kr/handle/2021.sw.korea/267256 2025-05-15T00:00:00Z