Abstract:Traditional multimodal sentiment analysis methods often suffer from information redundancy during feature concatenation and fusion, making it difficult to capture fine-grained and complex emotional features, while also exhibiting limited robustness in modality-missing and cross-domain transfer scenarios. Meanwhile, most existing mixture of experts (MoE) methods adopt a single-layered structure with ambiguous expert specialization, leading to functional overlap and suboptimal generalization. To address these issues, this study proposes a hierarchical gated expert mixture (H-GEM) model. A three-layer hierarchical expert architecture is constructed: a modality expert layer extracts modal features, a fusion and abstraction expert layer adaptively selects fusion strategies, and a sentiment polarity expert layer performs fine-grained modeling. In addition, information-theoretic and discriminative constraints are incorporated to enhance the semantic discriminability and sparsity of expert selection. By leveraging hierarchical gating for progressive decision-making, H-GEM ensures differentiated expert specialization and cross-task modeling. Experiments on CMU-MOSI and CMU-MOSEI datasets demonstrate that H-GEM outperforms baseline models across a series of metrics. Compared with single-layer MoE architectures, the significantly reduced routing entropy indicates effective mitigation of expert redundancy. Moreover, the proposed model demonstrates higher robustness in low-resource and modality-missing scenarios, highlighting its strong practical applicability.