基于概率图模型的图像整体场景理解综述

doi:10.11772/j.issn.1001-9081.2014.10.2913

计算机应用 ›› 2014, Vol. 34 ›› Issue (10): 2913-2921.DOI: 10.11772/j.issn.1001-9081.2014.10.2913

基于概率图模型的图像整体场景理解综述

李林¹,²,练金²,吴跃¹,叶茂¹

1. 电子科技大学计算机科学与工程学院，成都 611731
2. 四川托普信息技术职业学院电子商务系，成都 6117431

收稿日期:2014-04-28 修回日期:2014-06-16 出版日期:2014-10-01 发布日期:2014-10-30
通讯作者: 李林
作者简介:李林(1973-)，男，四川蒲江人，副教授，博士研究生，CCF会员，主要研究方向：机器学习算法及其在计算机图像图像理解中的应用；练金(1981-)，男，四川成都人，讲师，硕士，主要研究方向：图像处理及其在数字化教学中的应用；吴跃(1958-)，男，四川成都人，教授，硕士，主要研究方向：计算机网络、数据挖掘；叶茂(1973-)，男，重庆大足人，教授，博士，主要研究方向：事数据挖掘、计算机视觉、智能信息处理。
基金资助:
教育部人文社会科学研究项目;四川杰出青年基金资助项目

Survey on image holistic scene understanding based on probabilistic graphical model

LI Lin¹,²,LIAN Jin¹,WU Yue²,YE Mao²

1. Department of Electronic Commerce, Sichuan TOP IT Vocational Institute, Chengdu Sichuan 611743, China
2. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 611731, China;

Received:2014-04-28 Revised:2014-06-16 Online:2014-10-01 Published:2014-10-30
Contact: LI Lin

摘要/Abstract

摘要：

近年来,计算机图像理解技术在智能交通、卫星遥感、机器视觉、医疗图像分析、网络图像搜索等多个领域得到广泛应用。图像整体场景理解作为其延伸，其复杂性和综合性远高于基本图像理解任务。针对这一特点，从图像理解基本框架、图像整体场景理解研究价值和意义、典型模型等多方面进行了归纳与分析，重点介绍了四种代表性的整体场景理解模型，并详细比较了模型架构。最后指出了目前图像整体场景理解研究不足以及未来发展方向，为该领域的进一步研究提供参考。

Abstract:

In the recent years, the computer image understanding has wide and profound applications in intelligence traffic, satellite remote sensing, machine vision, image analysis of medical treatment, Internet image search and etc. As its extension, the image holistic scene understanding is more complex and integrated than basic image scene understanding task. In this paper, the basic framework for image understanding, the researching implication and value, typical models for image holistic scene understanding were summarized. The four typical holistic scene understanding models were introduced, and the model frameworks were thoroughly compared. At last, some research insufficiency and future direction in image holistic scene understanding were presented, which pointed out some new insights for the further research in this area.

中图分类号:

TP391.413

李林练金吴跃叶茂. 基于概率图模型的图像整体场景理解综述[J]. 计算机应用, 2014, 34(10): 2913-2921.

LI Lin LIAN Jin WU Yue YE Mao. Survey on image holistic scene understanding based on probabilistic graphical model[J]. Journal of Computer Applications, 2014, 34(10): 2913-2921.

参考文献

［1］BALLARD D H. Animate vision ［J］. Artificial Intelligence, 1991,48(1):57-86.
［2］GAO J, XIE Z. Image understanding theory and method ［M］. Beijing: Science Press, 2009:85-89.(高隽，谢昭. 图像理解理论与方法［M］. 北京:科学出版社, 2009:85-160.)
［3］MURPHY K P. Machine learning: a probabilistic perspective ［M］. Cambridge: MIT Press, 2012:32-58.
［4］LOWE D G. Distinctive image features from scale-invariant key-points ［J］. International Journal of Computer Vision, 2004,60(2):91-110.
［5］BAY H, ESS A, TUYTELAARS T, et al.Speeded-Up Robust Features (SURF) ［J］. Computer Vision and Image Understanding, 2008,110(3):346-359.
［6］OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(7):971-987.
［7］WANG X, HAN T X, YAN S. An HOG-LBP human detector with partial occlusion handling ［C］// Proceedings of the 12th International Conference on Computer Vision. Piscataway: IEEE Press, 2009:32-39.
［8］HUANG Y, WU Z, WANG L, et al.Feature coding in image classification: a comprehensive study ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,36(3):493-506.
［9］LEUNG T, MALIK J. Representing and recognizing the visual appearance of materials using three-dimensional textons ［J］. International Journal of Computer Vision, 2001,43(1):29-44.
［10］CSURKA G, DANCE C R, FAN L, et al.Visual categorization with bags of keypoints ［C］// Proceedings of the Workshop on Statistical Learning in Computer Vision, ECCV. Berlin: Springer Press, 2004:1-22.
［11］JOACHIMS T. Text categorization with support vector machines: learning with many relevant features ［C］// Proceedings of the 10th European Conference on Machine Learning. Berlin: Springer, 1998:137-142.
［12］SINHA A, BANERJI S, LIU C. Novel color Gabor-LBP-PHOG (GLP) descriptors for object and scene image classification ［C］// Proceedings of the 8th Indian Conference on Computer Vision, Graphics and Image Processing. New York: ACM Press, 2012：58.
［13］BOSCH A, ZISSERMAN A, MUOZ X. Image classification using random forests and ferns ［C］// Proceedings of the 11th International Conference on Computer Vision. Piscataway: IEEE Press, 2007:1-8.
［14］VEDALDI A, ZISSERMAN A. Efficient additive kernels via ex-plicit feature maps ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012,34(3):480-492.
［15］DALAL N, TRIGGS B. Histograms of oriented gradients for human detection ［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2005: 886-893.
［16］LI S-Z. Markov random field modeling in computer vision ［M］. London: Springer-Verlag, 1995.
［17］LAFFERTY J, McCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data ［EB/OL］. ［2014-01-06］. http://wenku.baidu.com/view/ccfac42e7375a417866f8fca.html.
［18］SHOTTON J, WINN J, ROTHER C, et al.Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation［C］// Proceeding of the 9th European Conference on Computer Vision. Berlin: Springer, 2006:1-15.
［19］KOHLI P, TORR P H. Robust higher order potentials for enforcing label consistency ［J］. International Journal of Computer Vision, 2009,82(3):302-324.
［20］SZELISKI R, ZABIH R, SCHARSTEIN D, et al.A comparative study of energy minimization methods for Markov random fields with smoothness-based priors ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008,30(6):1068-1080.
［21］HOFMANN T. Unsupervised learning by probabilistic latent semantic analysis ［J］. Machine Learning, 2001,42(1/2):177-196.
［22］BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation ［J］. The Journal of Machine Learning Research, 2003,3(4/5):993-1022.
［23］BEAL M J. Variational algorithms for approximate Bayesian inference ［D］. London: University of London, 2003.
［24］HASTINGS W K. Monte Carlo sampling methods using Markov chains and their applications ［J］. Biometrika, 1970,57(1):97-109.
［25］PRINCE S J. Computer vision: models, learning, and inference ［M］. Cambridge: Cambridge University Press, 2012.
［26］HAZAN T, URTASUN R. A primal-dual message-passing algorithm for approximated large scale structured prediction ［EB/OL］. ［2014-01-08］. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2539E78E8475F5DE361BB670F4A1F49F?doi=10.1.1.231.2336&rep=rep1&type=pdf.
［27］CARUANA R, NICULESCU-MIZIL A. An empirical comparison of supervised learning algorithms［C］// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006:161-168.
［28］WAINWRIGHT M J, JORDAN M I. Graphical models, exponential families, and variational inference ［J］. Foundations and Trends in Machine Learning, 2008,1(1/2):1-305.
［29］WAINWRIGHT M J, JAAKKOLA T S, WILLSKY A S. Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching ［EB/OL］. ［2014-01-07］. http://www.eecs.berkeley.edu/~wainwrig/Papers/WJW_AIStat03.pdf
［30］FELZENSZWALB P F, ZABIH R. Dynamic programming and graph algorithms in computer vision ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011,33(4):721-740.
［31］BESAG J, YORK J, MOLLI A. Bayesian image restoration, with two applications in spatial statistics ［J］. Annals of the Institute of Statistical Mathematics, 1991,43(1):1-20.
［32］LI C, KOWDLE A, SAXENA A, et al.Toward holistic scene understanding: feedback enabled cascaded classification models [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012,34(7):1394-1408.
［33］BOYKOV Y Y, JOLLY M P. Interactive graph cuts for optimal boundary and region segmentation of objects in ND images ［C］// Proceedings of the 8th International Conference on Computer Vision. Piscataway: IEEE Press, 2001:105-112.
［34］LANDGREBE T C W, PACLIK P, DUIN R P W, et al.Precision-recall operating characteristic (P-ROC) curves in imprecise environments ［C］// Proceedings of the 18th International Conference on Pattern Recognition. Piscataway: IEEE Press, 2006:123-127.
［35］JACCARD P. The distribution of the flora in the alpine zone ［J］. New Phytologist, 2006,11(2):37-50.
［36］LI H. Statistic learning method ［M］. Beijing: Tsinghua University Press, 2012.(李航. 统计学习方法［M］.北京:清华大学出版社,2012.)
［37］SAXENA A, SUN M, NG A Y. Make3D: learning 3D scene structure from a single still image ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009,31(5):824-840.
［38］LI C, KOWDLE A, SAXENA A, et al.Toward holistic scene understanding: feedback enabled cascaded classification models ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012,34(7):1394-1408.
［39］YAO J, FIDLER S, URTASUN R. Describing the scene as a whole: joint object detection, scene classification and semantic segmentation ［C］// Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2012:702-709.
［40］BORJI A, ITTI L. State-of-the-art in visual attention modeling ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(1):185-207.
［41］FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al.Object detection with discriminatively trained part-based models ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010,32(9):1627-1645.
［42］VANTARAM S R, SABER E. Survey of contemporary trends in color image segmentation ［J］. Journal of Electronic Imaging, 2012,21(4):1-28.
［43］XIAO J, HAYS J, EHINGER K A, et al.SUN database: large-scale scene recognition from abbey to zoo ［C］// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2010: 3485-3492.
［44］BLAKE A, KOHLI P, ROTHER C. Markov random fields for vision and image processing ［M］. Cambridge: MIT Press, 2011.
［45］KOHLI P, LADICKY L, TORR P. Graph cuts for minimizing robust higher order potentials ［J］. International Journal of Computer Vision, 2009,82(3):302-324.
［46］KOHLI P, KUMAR M P, TORR P H S. P3 & beyond: solving energies with higher order cliques ［C］// Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2007:1-8.
［47］SCHWING A, HAZAN T, POLLEFEYS M, et al.Distributed message passing for large scale graphical models ［C］// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2011:1833-1840.
［48］KOHLI P, OSOKIN A, JEGELKA S. A principled deep random field model for image segmentation ［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013:1971-1978
［49］HEITZ G, GOULD S, SAXENA A, et al.Cascaded classification models: combining models for holistic scene understanding［C］// Proceedings of the 2008 22nd Annual Conference on Neural Information Processing Systems. Vancouver: Curran Associates, 2008:641-648.
［50］BARROW H G, TENENBAUM J M. Recovering intrinsic scene characteristics from images ［EB/OL］. ［2014-02-13］. http://www.docin.com/p-422669730.html.
［51］LAMPERT C H, NICKISCH H, HARMELING S. Learning to detect unseen object classes by between-class attribute transfer ［C］// Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009:951-958.
［52］PERRONNIN F. Universal and adapted vocabularies for generic visual categorization ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008,30(7):1243-1256.
［53］FERRARI V, ZISSERMAN A. Learning visual attributes ［EB/OL］. ［2014-02-10］. http://www.docin.com/p-787699223.html.
［54］TORRALBA A, MURPHY K P, FREEMAN W T. Contextual models for object detection using boosted random fields ［C］// Proceedings of the 2004 18th Annual Conference on Neural Information Processing Systems. Vancouver: Neural Information Processing Systems Foundation, 2004:1401-1408 .
［55］HOIEM D, EFROS A A, HEBERT M. Closing the loop in scene interpretation ［C］// Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2008:1-8.
［56］SUDDERTH E B, TORRALBA A, FREEMAN W T, et al.Learning hierarchical models of scenes, objects, and parts ［C］// Proceedings of the 2005 10th IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2005:1331-1338.
［57］KUMAR S, HEBERT M. A hierarchical field framework for unified context-based classification ［C］// Proceedings of the 2005 10th IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2005:1284-1291.
［58］GONFAUS J M, BOIX X, van de WEIJER J, et al.Harmony potentials for joint classification and segmentation ［C］// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2010:3280-3287.
［59］LADICKY L, RUSSELL C, KOHLI P, et al.Graph cut based inference with co-occurrence statistics ［C］// Proceeding of the 2010 11th European Conference on Computer Vision. Berlin: Springer-Verlag, 2010:239-253.
［60］LI L, SOCHER R, LI F. Towards total scene understanding: classification, annotation and segmentation in an automatic framework ［C］// Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2009:2036-2043.
［61］GOULD S, GAO T, KOLLER D. Region-based segmentation and object detection ［C］// Proceedings of the 2009 23rd Annual Conference on Neural Information Processing Systems. Vancouver: Curran Associates, 2009:655-663.
［62］LI F, PERONA P. A Bayesian hierarchical model for learning natural scene categories ［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2005:524-531.
［63］LI L, LI F. What, where and who? classifying events by scene and object recognition ［C］// Proceedings of the 11th International Conference on Computer Vision. Piscataway: IEEE Press, 2007:1-8.
［64］TU Z, CHEN X, YUILLE A L, et al.Image parsing: unifying segmentation, detection, and recognition ［J］. International Journal of Computer Vision, 2005,63(2):113-140.
［65］SUDDERTH E B, TORRALBA A, FREEMAN W T, et al.Describing visual scenes using transformed objects and parts ［J］. International Journal of Computer Vision, 2008,77(1/2/3):291-330.
［66］WOJEK C, SCHIELE B. A dynamic conditional random field model for joint labeling of object and scene classes ［C］// Proceeding of the 2008 9th European Conference on Computer Vision. Berlin: Springer-Verlag, 2008:733-747.
［67］GOULD S, FULTON R, KOLLER D. Decomposing a scene into geometric and semantically consistent regions ［C］// Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2009:1-8.
［68］KUMAR M P, KOLLER D. Efficiently selecting regions for scene understanding ［C］// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2010:3217-3224.
［69］LADICKY L, STURGESS P, ALAHARI K, et al.What, where and how many? combining object detectors and CRFS ［C］// Proceeding of the 11th European Conference on Computer Vision. Berlin: Springer-Verlag, 2010:424-437.
［70］FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting ［J］. Journal of Computer and System Sciences, 1997,55(1):119-139.
［71］VIOLA P, JONES M J. Robust real-time face detection ［J］. International Journal of Computer Vision, 2004,57(2):137-154.
［72］BRUBAKER S C, WU J, SUN J, et al.On the design of cascades of boosted ensembles for face detection ［J］. International Journal of Computer Vision, 2008,77(1/2/3):65-86.
［73］TU Z, BAI X. Auto-context and its application to high-level vision tasks ［C］// Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2008:1-8.

[1]	温静, 宋建伟. 基于多级全局信息传递模型的视觉显著性检测[J]. 计算机应用, 2021, 41(1): 208-214.
[2]	许长青, 陈振杰, 侯仁福. 融合不精准先验知识的Landsat 8 OLI影像深度学习分类方法[J]. 计算机应用, 2020, 40(12): 3550-3557.
[3]	陈利霞, 班颖, 王学文. 基于张量核范数与3D全变分的背景减除[J]. 计算机应用, 2020, 40(9): 2737-2742.
[4]	邓茜文, 冯子亮, 邱晨鹏. 基于近红外与可见光双目视觉的活体人脸检测方法[J]. 计算机应用, 2020, 40(7): 2096-2103.
[5]	王肖, 魏嘉旺, 袁玉波. 基于特征部位圆形域的人脸图像修复方法[J]. 计算机应用, 2020, 40(3): 847-853.
[6]	刘颖, 王凤伟, 刘卫华, 艾达, 李芸, 杨凡超. 基于亮度分区模糊融合的高动态范围成像算法[J]. 计算机应用, 2020, 40(1): 233-238.
[7]	王文杰, 乔志伟, 牛蕾, 席雅睿. 自适应步长非局部全变分约束迭代图像重建算法[J]. 计算机应用, 2020, 40(1): 245-251.
[8]	严经纬, 李强, 王春茂, 谢迪, 王保青, 戴骏. 面部运动单元检测研究综述[J]. 计算机应用, 2020, 40(1): 8-15.
[9]	王书朋, 赵瑶. 基于自适应分割的多曝光图像融合算法[J]. 计算机应用, 2020, 40(1): 252-257.
[10]	马伟苹, 李文新, 孙晋川, 曹鹏霞. 基于粗精立体匹配的双目视觉目标定位方法[J]. 计算机应用, 2020, 40(1): 227-232.
[11]	王鑫, 张鑫, 宁晨. 基于多特征降维和迁移学习的红外人体目标识别方法[J]. 计算机应用, 2019, 39(12): 3490-3495.
[12]	曹昀炀, 王涛. 耦合先验拉普拉斯坐标的半监督图像分割算法[J]. 计算机应用, 2019, 39(9): 2695-2700.
[13]	王克强, 张雨帅, 王保群. 基于Retinex理论的多曝光图像融合算法[J]. 计算机应用, 2019, 39(7): 2087-2092.
[14]	李文俊, 陈斌, 李建明, 钱基德. 基于深度神经网络的表面划痕识别方法[J]. 计算机应用, 2019, 39(7): 2103-2108.
[15]	朱锴, 付忠良, 陈晓清. 基于卷积神经网络的超声图像左心室分割方法[J]. 计算机应用, 2019, 39(7): 2121-2124.

基于概率图模型的图像整体场景理解综述

Survey on image holistic scene understanding based on probabilistic graphical model

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics