《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (10): 3158-3166.DOI: 10.11772/j.issn.1001-9081.2023101427

• 先进计算 • 上一篇    下一篇

基于自学习的整数数列符号回归方法

孙凯明, 蔡东风, 白宇()   

  1. 沈阳航空航天大学 计算机学院,沈阳 110136
  • 收稿日期:2023-10-23 修回日期:2024-01-11 接受日期:2024-01-18 发布日期:2024-10-15 出版日期:2024-10-10
  • 通讯作者: 白宇
  • 作者简介:孙凯明(2000—),男,河北石家庄人,硕士研究生,主要研究方向:自然语言处理
    蔡东风(1958—),男,辽宁沈阳人,教授,博士,主要研究方向:人工智能、自然语言处理
    白宇(1982—),男(回族),内蒙古赤峰人,副教授,博士研究生,CCF会员,主要研究方向:信息检索、语言与知识计算 nlpxiaobai@163.com

Symbolic regression method for integer sequence based on self-learning

Kaiming SUN, Dongfeng CAI, Yu BAI()   

  1. School of Computer Science,Shenyang Aerospace University,Shenyang Liaoning 110136,China
  • Received:2023-10-23 Revised:2024-01-11 Accepted:2024-01-18 Online:2024-10-15 Published:2024-10-10
  • Contact: Yu BAI
  • About author:SUN Kaiming, born in 2000, M. S. candidate. His research interests include natural language processing.
    CAI Dongfeng, born in 1958, Ph. D., professor. His research interests include artificial intelligence, natural language processing.

摘要:

针对现有符号回归方法难以有效泛化至整数数列在线百科全书(OEIS)中数列的问题,提出一种基于自学习(SL)的整数数列符号回归方法。首先,通过程序构造多种学习数据,结合OEIS数据的特点融入高阶线性递推数据,并采用OEIS初始项生成递推数列;其次,将学习数据转换为OEIS数据,提出融合多种OEIS数据作为初始迭代数据的策略;最后,通过自学习迭代逐步发现OEIS数列的公式,迭代过程分为学习、搜索、检验、选择这4个阶段。实验结果表明,所提方法优于深度符号回归(DSR)方法和Mathematica内置函数,在Easy、Sign和Base这3个测试集上相较于DSR的准确率分别提升9.66、4.17和5.14个百分点,共发现27 433个OEIS数列的公式,其中新发现的公式可以辅助数学家研究相关理论。

关键词: 符号回归, 自学习, 公式发现, 整数数列在线百科全书, Transformer

Abstract:

Aiming at the problem that existing symbolic regression methods are difficult to effectively generalize to sequences in the On-line Encyclopedia of Integer Sequences (OEIS), a symbolic regression method for integer sequence based on Self-Learning (SL) was proposed. Firstly, a variety of learning data were constructed through programs, and integrated into high-order linear recursive data according to the characteristics of OEIS data, and the OEIS initial term was used to generate recursive sequences. Secondly, the learning data were converted into OEIS data, and a strategy of fusing multiple OEIS data as the data of initial iteration was proposed. Finally, the formulas of the OEIS sequences were gradually discovered through self-learning iteration. The iteration process was divided into four stages: Learn, Search, Check and Select. Experimental results show that the proposed method is better than the Deep Symbolic Regression (DSR) method and Mathematica’s built-in function. Compared with the DSR on the three test sets of Easy, Sign and Base, the accuracy of the proposed method improved by 9.66, 4.17, and 5.14 percentage points respectively. A total of 27 433 formulas of the OEIS sequence were found. The newly discovered formulas can assist mathematicians in conducting related theoretical research.

Key words: symbolic regression, Self-Learning (SL), formula discovery, OEIS (On-line Encyclopedia of Integer Sequences), Transformer

中图分类号: