大数据分析及处理综述Overview of Big Data Analysis and Processing
蔡江辉,杨雨晴
摘要(Abstract):
当今世界,数据正以前所未有的速度爆炸式增长,海量的数据成了各行各样重要的战略资源。自2008年大数据产生以来,与其相关的众多领域取得了一些令人瞩目的成就,同时也面临着诸多挑战。大数据分析及处理作为大数据领域最核心的问题,一直以来是国内外关注的焦点。为了让读者更加深入了解大数据分析及处理的基本理论并深入认识所面临的诸多挑战,在充分调研大数据分析及处理解相关技术的基础上给出了大数据分析及处理综述。文中首先简单介绍了大数据的内涵和特性,给出了大数据分析及处理的技术体系;然后从文本大数据分析与挖掘、网路大数据分析与挖掘、多媒体大数据分析与挖掘以及移动大数据分析与挖掘四个方面重点概述了国内外大数据分析及处理的研究现状;最后,总结分析了目前大数据分析及处理面临的主要问题和挑战。
关键词(KeyWords): 大数据;大数据分析与处理;问题和挑战
基金项目(Foundation): 国家自然科学基金(U1931209,U1731126)
作者(Author): 蔡江辉,杨雨晴
参考文献(References):
- [1] 杨梽永.2020中国大数据产业发展新机遇[J].软件和集成电路,2020(Z1):90-92+94.
- [2] 冯伟.大数据时代面临的信息安全机遇和挑战[J].中国科技投资,2012(34):49-53.
- [3] 彭志勇,龙虎.大数据背景下信息安全问题的研究[J].科学技术创新,2020(9):70-71.
- [4] Big Data[EB /OL].[2012-10-02].http://en.Wikipedia.org/wiki /Big_data.
- [5] 徐宗本,张维,刘雷,等.数据科学与大数据的科学原理及发展前景[J].科技促进发展,2014,10(1):66-75.
- [6] 武延军.大数据时代已经来临——人机物融合的大数据时代[J].高科技与产业化,2015(5):46-49.
- [7] CHO J,GARICA-MOLINA H.Parallel crawlers[C]//Proceedings of the 11th International Conference on World Wide Web,Honolulu,2002,124-135.
- [8] CHANG V,WILLS G.A model to compare cloud and non-cloud storage of BigData[J].Future Generation Computer Systems,2016,57:56-76.
- [9] GARCíA SALVADOR,RAMíREZ-GALLEGO SERGIO,LUENGO JULIáN,et al.Big data preprocessing:methods and prospects[J].Big Data Analytics,2016,1(1):9.
- [10] 郭志懋,周傲英.数据质量和数据清洗研究综述[J].软件学报,2002,13(11):2076-2082.
- [11] 杨东华,李宁宁,王宏志,等.基于任务合并的并行大数据清洗过程优化[J].计算机学报,2016(1):97-108.
- [12] SHANMUGAM K,HARALICK R M.A Computationally Simple Procedure for Imagery Data Compression by the Karhunen-Love Method[J].IEEE Transactions on Systems Man & Cybernetics,2010,SMC-3(2):202-204.
- [13] WU X,ZHU X,WU G Q,et al.Data mining with big data[J].IEEE Transactions on Knowledge & Data Engineering,2013,26(1):97-107.
- [14] 班雷雨,霍欢,徐彪.基于移动数据的异常区域时序分析[J].计算机应用研究,2017,34(2):431-435.
- [15] G?KHAN YAVAS,DIMITRIOS KATSAROS,?ZGüR UIUSOYL,et al.A data mining approach for location prediction in mobile environments[J].Data & Knowledge Engineering,2005,54(2):121-146.
- [16] 许寿全.移动对象周期模式发现方法研究[D].南京:南京航空航天大学,2014.
- [17] ALI S M,GUPTA N,LENKA R K,et al.Big data visualization:Tools and challenges[C]//2nd International Conference on Contemporary Computing and Informatics (ic3i).IEEE,2017.
- [18] WANG G Z,WEN C K,YAN B H,et al.Topic hypergraph:hierarchical visualization of thematic structures in long documents[J].Sci China Inf Sci,2013,56:052111.
- [19] MORRISON D A.Phylogenetic networks:a new form of multivariate data summary for data mining and exploratory data analysis[J].Wiley Interdiscip Rev Data Mining Knowl Discov,2014,4:296-312.
- [20] DHAR A,MUKHERJEE H,DASH N SN,et al.Automatic categorization of web text documents using fuzzy inference rule[J].Sādhanā 45,168(2020).https://doi.org/10.1007/s12046-020-01401-6.
- [21] 孙护军.基于大数据分析的增强型网络文档分类模型[J].计算机工程与设计,2019,40(3):755-761.
- [22] 单晓磊.高维文本数据聚类算法及并行设计研究[D].辽宁大连:大连理工大学,2019.
- [23] HUSSAIN S,HUSSAIN M,AFZAL M,et al.Semantic Preservation of standardized healthcare documents in big data[J].International Journal of Medical Informatics,2019,129:133-145.
- [24] HIRSCH J E.An index to quantify an individual’s scientific research output[J].Proc Natl Acad Sci USA,2005,102:16569-16572.
- [25] WATTS D J.Six Degrees:The Science of A Connected Age[J].New York:WW Norton,2004.
- [26] SHEN H W,CHENG X Q,GUO J F.Quantifying and identifying the overlapping community structure in networks[J].Journal of Statistical Mechanics:Theory and Experiment,2009,(7):P07042.
- [27] LESKOVECJ,LANG K J,MAHONEY M.Empirical comparison of algorithms for network community detection[C]//Proceedings of the 19th International Conference on World Wide Web,2010,631-640.
- [28] DU N,WU B,PEI X,et al.Community detection in large-scale social networks[C]//Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis,San Jose,2007,16-25.
- [29] PALLA G,DERéNYI I,FARKAS I,et al.Uncovering the overlapping community structure of complex networks in nature and society[J].Nature,2005,435:814-818.
- [30] SHEN H W,CHENG X Q,CAI K,et al.Detect overlapping and hierarchical community structure in networks[J].Physical A,2009,388(8):1706-1712.
- [31] 乔少杰,韩楠,张凯峰,等.复杂网络大数据中重叠社区检测算法[J].软件学报,2017,28(3):631-647.
- [32] SUN H L,JIE W,LOO JONATHAN,et al.A parallel self-organizing overlapping community detection algorithm based on swarm intelligence for large scale complex networks[J].Future Generation Computer Systems,2018,89:265-285.
- [33] HU F,LIU J,LI L H,LIANG L.Community detection in complex networks using Node2vec with spectral clustering[J/OL].Physica A:Statistical Mechanics and its Applications,https://doi.org/10.1016/j.physa.2019.123633.
- [34] 蒋盛益,杨博泓,王连喜.一种基于增量式谱聚类的动态社区自适应发现算法[J].自动化学报,2015,41(12):2017-2025.
- [35] CHEN Q F,QIAO Y L,HU F,et al.Community Detection in Complex Network Based on APT Method[J/OL].Pattern Recognition Letters,2020,https://doi.org/10.1016/j.patrec,2020.07.021.
- [36] 王娟,蒋兴浩,孙锬锋.视频摘要技术综述[J].中国图象图形学报,2014,19(12):1685-1695.
- [37] DING D,METZE F,RAWAT S,et al.Beyond audio and video retrieval:towards multimedia summarization[C]//Proceedings of the 2nd ACM International Conference on Multimedia Retrieval,Hong Kong,2012.
- [38] 冀中,樊帅飞.基于超图排序算法的视频摘要[J].电子学报,2017,45(5):1035-1043.
- [39] WANG M,NI B B,HUA X S,et al.Assistive tagging:A survey of multimedia tagging with human-computer joint exploration[J].ACM Comput Surv,2012,44:25.
- [40] 刘衡.海量多媒体数据的地理信息标注技术及其应用[D].合肥:中国科学技术大学,2014.
- [41] HU W M,XIE N H,LI L,et al.A survey on visual content-based video indexing and retrieval[J].IEEE Trans Syst Man Cybern Part C-Appl Rev,2011,41:797-819.
- [42] SHAO L,JONES S,LI X L.Efficient Search and Localization of Human Actions in Video Databases[J].IEEE Trans Circuit Syst Video Technol,2014,24:504-512.
- [43] 李强,张钹.一种基于图像灰度的快速匹配算法[J].软件学报,2006(2):216-222.
- [44] YANG Y K,JIAO S G,HE J R,et al.Image retrieval via learning content-based deep quality model towards big data[J].Future Generation Computer Systems,2020,112:243-249.
- [45] PARK Y J,CHANG K N.Individual and group behavior-based customer profile model for personalized product recommendation[J].Expert Syst Appl,2009,36:1932-1939.
- [46] LUIS M DE CAMPOS,JUAN M FERNáNGEZ-LUNA,JUAN F HUETE,et al.Rueda-Morales.Combining content-based and collaborative recommendations:A hybrid approach based on Bayesian networks[J].Int J Approx Reasoning,2010,51:785-799.
- [47] 黄漫国,樊尚春,郑德智,等.多传感器数据融合技术研究进展[J].传感器与微系统,2010,29(3):5-8+12.
- [48] 周世杰,张文清,罗嘉庆.射频识别(RFID)隐私保护技术综述[J].软件学报,2015,26(4):960-976.
- [49] MUSOLESI MIRCO.Big Mobile Data Mining:Good or Evil[J].IEEE Internet Computing,2014,18(1):78-81.
- [50] SGHAIER GUIZANI.A k‐means clustering‐based security framework for mobile data mining[J].Wireless Communications and Mobile Computing,2016,16(18):3449-3454.
- [51] 张长海.一种基于序列模式的RFID数据挖掘算法[J].电脑知识与技术,2015,11(13):259-260.
- [52] WU E,DIAO Y L,RIZVI S.High-performance complex event processing over streams[C]//Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data,Chicago,2006,407-418.
- [53] 於志文,周兴社,於志勇.普适个性化多媒体服务技术综述[J].计算机应用研究,2006(10):6-10.
- [54] 孙天悦.基于移动数据的旅游行为模式挖掘[D].北京:北京邮电大学,2018.
- [55] 王雪,钱志鸿,刘晓慧,等.改进的树型结构RFID防碰撞算法[J].通信学报,2015,36(7):129-137.