Journal of Computer Applications ›› 2009, Vol. 29 ›› Issue (10): 2766-2771.

• Data mining • Previous Articles     Next Articles

Query schedule and load shedding model in data stream system

  

  • Received:2009-04-15 Revised:2009-06-10 Online:2009-10-28 Published:2009-10-01

数据流系统中的一种查询调度及负载脱落模型

王丹1,李茂增2   

  1. 1. 北京工业大学计算机学院
    2.
  • 通讯作者: 王丹

Abstract: It is one of the major tasks to execute query timely with less performance and precise loss in a data stream system when the system resource is limited. This paper solved this problem from two aspects including optimizing operator schedule and performing load shedding. Taking different operators features into consideration, a scheduling strategy based on operator priority was presented, which comprehensively considered the factors related to the operators and the system running state. In order to dynamically modify the operator priority, the artificial neural network learning algorithm was also introduced, which can modify operator priority according to the system performance. Aiming to solve the potential overload problems caused by the uncertainty of the arrived data in a data stream management system, the load shedding issue of the data stream system was researched. Concerning the query of the two streams joint operators, a semantic-based load shedding technique was applied. A data stream load shedding model was designed and implemented, which solved four problems including load shedding and anti-shedding time, amount, location and predicate. The experiment result was analyzed, which showed that the load shedding model presented can effectively avoid the low processing efficiency when system is in the state of overload, and guarantee the coordination of arrived data and system processing capability.

Key words: data stream, query, schedule, priority, load shedding

摘要: 如何在资源有限的情况下,快速执行查询处理并最大限度地减少查询精度的损失是数据流查询处理的主要任务之一。从操作符的优化调度和负载脱落两个方面研究了这一问题。分析了影响操作符调度的主要因素,结合操作符对不同元组的不同处理特性以及系统运行状态,设计并实现了一个基于优先级的调度模型。其中采用人工神经元网络中的算法对影响操作符优先级的权重系数进行训练,实现了基于动态优先级的调度。使用负载脱落技术可以使系统在大量突发数据流元组进入系统而系统无法处理时及时脱落其中的部分数据,维持系统的正常运转,提高系统查询处理的可用性。针对存在两个数据流连接操作符的查询请求,研究了负载脱落和反脱落的时机、数量、位置、谓词等问题,设计并实现了一个基于语义的负载脱落模型。算法和模型的运行结果表明该模型在过载时系统能够及时降载,在欠载时能及时进行反脱落操作,减少了性能的损失。

关键词: 数据流, 查询, 调度, 优先级, 负载脱落