《计算机应用》唯一官方网站

• •    下一篇

求解多模概率分布Gamma混合模型的半EM算法

陈佳琪1,何玉林2,成英超2,黄哲学3   

  1. 1. 深圳大学
    2. 人工智能与数字经济广东省实验室(深圳)
    3. 深圳大学计算机与软件学院大数据所
  • 收稿日期:2024-07-08 修回日期:2024-10-09 发布日期:2024-11-19 出版日期:2024-11-19
  • 通讯作者: 陈佳琪
  • 基金资助:
    广东省自然科学基金面上项目;深圳市基础研究重点项目;深圳市基础研究面上项目

Semi-EM algorithm for estimating Gamma mixture model of multimodal probability distribution

  • Received:2024-07-08 Revised:2024-10-09 Online:2024-11-19 Published:2024-11-19

摘要: 期望最大化(EM)算法在混合模型参数估计中发挥着重要作用,但现有的EM算法在求解Gamma混合模型(GaMM)参数时存在局限性,主要体现在因近似计算导致低质量的参数估计,以及由于大量数值计算造成的计算效率低下问题。为克服这些局限,并充分利用数据的多模性质,提出了一种半EM(Semi-EM)算法,以求解用于估计多模概率分布的GaMM。首先,该算法通过聚类探测数据的空间分布特性,用以初始化GaMM参数,进而更准确地刻画数据的多模性。其次,在EM算法框架的基础上,对于缺乏封闭更新表达式而导致的参数更新困难问题,采用自定义的启发式策略对GaMM形状参数进行更新,使其朝着最大化对数似然的方向逐步调整,同时以封闭形式更新其余参数。经过一系列具有说服力的实验,验证了所提出的Semi-EM算法的可行性、合理性和有效性。实验结果表明,Semi-EM算法在精确估计多模概率分布方面优于与之对比的四种算法,具体表现在Semi-EM算法具有更低的误差指标以及更高的对数似然值,表明该算法能够提供更准确的模型参数估计,从而更精确地刻画数据的多模性质。

关键词: 多模概率密度函数, Gamma混合模型, 期望最大化算法, 聚类, 对数似然函数

Abstract: The Expectation-Maximization (EM) algorithm plays an important role in parameter estimation for mixture models. However, existing EM algorithms for the Gamma Mixture Model (GaMM) have limitations. These limitations mainly arise from approximate computations leading to low-quality parameter estimates and inefficient computational performance due to numerical calculations. To address these limitations and fully exploit the multimodal nature of data, a Semi-EM algorithm was proposed to solve GaMM for multimodal probability distributions. Firstly, this algorithm used clustering to explore the spatial distribution characteristics of the data, allowing for an improved initialization of GaMM parameters and a more precise characterization of data’s multimodality. Secondly, based on the framework of the EM algorithm, a custom heuristic strategy was employed to address the challenge of parameter updates caused by the absence of closed-form expressions. The shape parameters of GaMM were gradually updated towards maximizing the log-likelihood value, while remaining parameters were updated in closed-form expressions. Finally, a series of persuasive experiments were conducted to validate the feasibility, rationality, and effectiveness of the proposed Semi-EM algorithm. Experimental results demonstrate that the Semi-EM algorithm outperforms four comparison algorithms in accurately estimating multimodal probability distributions. Specifically, the Semi-EM algorithm exhibits lower error metrics and higher log-likelihood values, indicating its ability to provide more accurate model parameter estimates and more precise characterization of the multimodal nature of the data.

Key words: multimodal probability density function, Gamma Mixture Model &#40, GaMM&#41, Expectation-Maximization &#40