A survey towards an integration of big data analytics to big insights for value-creation.docx

复制链接

Asurveytowardsanintegrationofbigdataanalyticstobiginsightsforvalue-creation Mandeep Kaur Saggi⁎, Sushma Jain Department of Computer Science, Thapar, University Patiala, India A R T I C L E I NFO Keywords: Bigdata DataanalyticsMachinelearning BigdatavisualizationDecision-makingSmartagricultureSmartcityapplicationValue-creation Value-discoverValue-realization A B S T R A C T BigDataAnalytics(BDA)isincreasinglybecomingatrendingpracticethatgeneratesan en-ormous amount of data and provides a new opportunity that is helpful in relevant decision-making. The developments in Big Data Analytics provide a new paradigm and solutions for bigdatasources,storage,andadvancedanalytics.TheBDAprovideanuancedviewofbigdatadevelopment, and insights on how it can truly create value for ﬁrm and customer. This article presents a comprehensive, well-informed examination, and realistic analysis of deploying big data analytics successfully in companies. It provides an overview of the architecture of BDA including six components, namely: (i) data generation, (ii) data acquisition, (iii) data storage, (iv)advanceddata analytics,(v) datavisualization,and(vi) decision-makingforvalue-creation. Inthis paper, seven V's characteristics of BDA namely Volume, Velocity, Variety, Valence, Veracity,Variability, and Value are explored. The various big data analytics tools, techniques and tech-nologies have been described. Furthermore, it presents a methodical analysis for the usage of BigData Analytics in various applications such as agriculture, healthcare, cyber security, and smartcity. This paper also highlights the previous research, challenges, current status, and future di-rections of big data analytics for various application platforms. This overview highlights threeissues, namely (i) concepts, characteristics and processing paradigms of Big Data Analytics; (ii)the state-of-the-art framework for decision-making in BDA for companies to insight value-crea-tion;and(iii)thecurrentchallengesofBigDataAnalyticsaswellaspossiblefuturedirections. Introductionandmotivation ThenotionofBigDataAnalytics(BDA)isdrivenbyunderpinningnewwavesofinnovation,analyticserviceswithintelligenceand stirring advances in technologyoverthe lastfew decades. Theemergenceapplicationsof BDA haveprompted the attentionofmany academic researchers, industry practitioners, and government organizations. It is a technology-driven ecosystem, where betterdecision-makingwillhelpmanyorganizationstoextractknowledgefromdatainaninterpretableandappropriateform. Strawn (2012), described Big Data as “fourth paradigm of science”, whereas (Hagstrom, 2012) deﬁned it as “new paradigm of knowledge assets”, or “the next frontier for innovation, competition, and productivity” (Manyika et al., 2011). Gantz and Reinsel, (2011) deﬁned Big Data as “a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling the high-velocity capture, discovery, and analysis”. It was described an integrated approach to organize, process, analyze the sixcharacteristics(namelyvolume,variety,velocity,veracity,valence,andvalue).BDAis usedtogenerateactionfordeliveringtheinsights,value,measuringperformance,andestablishingcompetitiveadvantages(Wamba, ⁎Correspondingauthor. E-mail addresses: mandeepsaggi90@gmail.com (M.K. Saggi), sjain@thapar.edu (S. Jain). https://doi.org/10.1016/j.ipm.2018.01.010 Received20December2016;Receivedinrevisedform26January2018;Accepted30January2018 Availableonline09February2018 0306-4573/©2018ElsevierLtd.Allrightsreserved. Akter, Edwards, Chopin, & Gnanzou, 2015). The paper by (De Mauro, Greco, & Grimaldi, 2016) deﬁned that “Big Data is the information asset characterized by such a high volume, velocity, and variety to require speciﬁc technology and analytical methods for its transformation into value” . The BDA, as a scientiﬁc topic of investigation, provides some signiﬁcant and insightful readings that are discovered by various researchers. However, it is still needed to carryout the systematic review of innovative analytical methods, techniques, and tools formaking insightful decisions in various domains. Indeed, it became a key component of decision-making processes in business (Hagel, 2015). Thebigdataandadvanceddataanalyticstechniquescanbeusedforthe developmentof analyticaland computationalmodels(Iqbal, Doctor, More, Mahmud, & Yousuf, 2017). There are still several research interest how to develop the infrastructure, apply various data mining and machine learning algorithms in diﬀerent domains. The BDA is concerned with modern statistical and machine learning techniques to analyze huge amount of data (Suthaharan, 2014). The researchers suggested that Big Data Analytics and deep learning have the potential to provide new generation applications based on modeling and simulation (Chen & Lin, 2014; Tolk, 2015). Thetraditionaltoolsarenotabletoaddresstheissuesofscalability,adaptability,andusability,whereassuchissuesarecriticaltoitssuccessastheyinﬂuence how big data is developed, managed and analyzed. The BDA is categorized by the requirement of advanced data acquisition, data storage, data management, data analysis, and visualization. To turn BDA into big insights for value- creation, there are great challenges in terms of data, process, analytical modeling and management for diﬀerent applications. It should not be considered as synonymous with data collected through the internet as data can be originated from sources such as commercial transactions taking place in supermarkets, bank etc. Big Data can also be originated from sensors (satellite and GPS tracking data from mobile phones) and administrative data (education records, medical records, and tax records) (Eagle, Pentland, & Lazer, 2009). The BDA helps in acquiring a deep understanding and useful insights of various sectors such as: agriculture, healthcare, cyber-physicalsystem,smartcitiesandsocialmediaanalyticsetc.Theenormousamountofinformationisneededtoanalyzeitinaniterative way and time sensitive manner (Jukic, Sharma, Nestorov, & Jukic, 2015). By the use of advanced BDA tools such as NoSQL, BigQuery, Map Reduce, Hadoop, Flume, Mahout, Spark, WibiData, and Skytree, it provided an insight in desirable form to enhance the ability and decision-making process in various sectors such as business intelligence analytics (Chen, Chiang, & Storey, 2012), healthcare analytics (Archenaa & Anita, 2015), smart agriculture or farming analytics (Majumdar, Naraseeyappa, & Ankalaki, 2017; Wolfert, Ge, Verdouw, & Bogaardt, 2017), social media analytics (Vatrapu, Mukkamala, Hussain, & Flesch, 2016), smart cities (Khan, Anjum, Soomro, & Tahir, 2015), intelligent transport management (Fiosina, Fiosins, & Müller, 2013), ﬁnancial and accounting (Sledgianowski, Gomaa, & Tan, 2017), ﬁnancial risk management (Cerchiello & Giudici, 2016), energy management (Tu, He, Shuai, & Jiang, 2017), and future predictions (Waller & Fawcett, 2013). TheBDAisdata-drivendecisionframework.ThisarticleisdirectedtocomprehensivelystudytheBDAtosolvethechallenges,gain insight, and to make informed decisions by using various data analytics approaches. This paper summarizes an extensive andsystematicmethodologicalreviewonvarioustoolsandtechnologiesofBDAandalsoreportstheresearchgapsforfurtherin-vestigation.Inmoredetail,ourreviewarticleaddressedfollowingresearchquestions: • RQ1: What are the most important seven characteristics of Big Data Analytics? • RQ2: How to design BDA-DM framework? • RQ3: What are the main tools, techniques, and technologies of Big Data Analytics? • RQ4: What are the main application areas of Big Data Analytics? • RQ5: What is the relation between value-creation and Big Data Analytics? • RQ6: Which are the speciﬁc aspects of the data management, data transformation and utilization drive value for companies? • RQ7: Can the value of data be monetized, tracked and considered for ﬁnancial accounting? • RQ8: What are the diﬀerent challenges of each component of the BDA framework? This article attempts to answer the above research questions (RQs). RQs will guide, centre our research work and clearly focus onspeciﬁctopicstoindicateourdistinctiveperception.However,thisworkleadstoanewadvancementfortheconceptualframeworkofBDA. The contributions of this research article are as follows: Categorize the current approaches and generalrequirements for various components of BDA architecture by demonstratingtheopenstate-of-the-artframeworksandchallenges. • Summarize various existing tools, methods, and technologies in advanced BDA. • Provide the summary of the key technology for value-creation applications, ﬁnancial companies of BDA. • Present the, future research directions relating BDA in new emerging technologies. This paper is structured into eight sections. The Section 2 describes the relevant research methodology and summarizes the review studies. The Section 3 presents an ecosystem of Big Data Analytics and Decision-Making Framework (BDA-DMF). The Section 4 presents a big data management phase of the framework. The Section 5 presents Big Data Analytics techniques, technologies, tools, and its applications phase. In this section, we present a concise statement of diﬀerent steps of data analytics framework. A brief review of diﬀerent areas of application such as Agriculture, Healthcare, Cyber security and Smart City is also presented. The Section 6 covers a visualization phase of BDA framework. The Section 7 describes the value-creation need, beneﬁts, and framework of BDA for ﬁnancial and accounting companies. Section 8describestheconclusionandfutureresearchdirectionsintheareaofBigDataAnalytics. Aglimpseofbigdataresearchmethod:systematicmappingprocess In this paper, the articles from Web of Science digital database are considered. To ensure thoroughness and consistency in ourreview, the guidelines presented by (Brereton, Kitchenham, Budgen, Turner, & Khalil, 2007) are followed and used the digital library databases (Springer, Science Direct, Google scholar, IEEE Xplore, ACM library). The Web of Science database index containsseveraltypes of documentsnamelyarticles,reviews, proceedingpapers, meetingabstract, editorial material, book review, and book chapters. Signiﬁcant research publications have obtained from Web of Science on BDA, Big Data Analytics-Management (BDA-M) and Big Data Analytics-Machine Learning (BDA-ML) for a considerably large period of 20 years from (2000–2017). 2.1. Data inclusion and exclusion process In this paper, 70 primary studies have been selected and analyzed through a process that formulates criteria for inclusion andexclusionarticlesforreview.TheselectionprocessofprimarystudiesissummarizedinFig. 1. Firstly, the selection process of primary studies is conducted based on diﬀerent queries. The queries are executed topic-wise, title- wise and combination of both. The last ﬁlterisbasedontheabstractandfullreadingofthepaper.Ifitisnotrelevanttothestudy,itisautomaticallyexcluded.Onreadingtheabstractofresults,theinclusion/exclusioncriterionwasappliedattheend. • Secondly, the papers reporting on theoretical, empirical and both qualitative or quantitative case studies have been considered. Table1 Examplesofqueryextractingprocess. QueryNumber Topic/Title Keywords Q1 A1 TS=(“Big Data” AND “Big Data Analytics”) Q2 A1 TS=(“Big Data Analytics” AND “Big Data” AND “Management “) Q3 A1 TS=(“Big Data Analytics” AND “Big Data” AND “Machine Learning”) Q1 A2 TI=(“Big Data” AND “Big Data Analytics”) Q2 A2 TI = (“Management” AND “Big Data”) OR (“Management” AND “Big Data Analytics”) Q3 A2 TI = (“Machine Learning” AND “Big Data”) OR (“Machine Learning” AND “Big Data Analytics”) Q1 A3 TS = (“Big Data Analytics” AND “Big Data”) AND TI = (“Big Data Analytics” AND “Big Data”) Q2 A3 (TS = (“Management” AND “Big Data ”) OR (“Management” AND “Big Data Analytics”)) AND (TI = (“Management ” AND “Big Data”) OR (“Management” AND “Big Data Analytics”)) Q3 A3 (TS = (“Machine Learning ” AND “Big Data Analytics”) OR (“Machine Learning ” AND “Big Data ”)) AND (TI = (“Machine Learning ” AND “Big Data Analytics”) OR (“Machine Learning” AND “Big Data”)) Table2 Top18researchareasoftheexistingBigDataAnalyticscontributions. Researchareas Topic Title Topicandtitle Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 Computer Science 483 127 63 125 50 13 124 49 13 Engineering 196 63 20 56 25 6 55 25 5 Business Economics 128 55 2 29 18 1 29 18 2 Telecommunications 78 18 7 32 11 3 32 11 2 Information Science Library Science 69 18 2 15 4 1 14 3 1 Operations Research Management Science 61 23 1 11 6 1 11 6 1 Science Technology Other Topics 35 9 5 9 5 1 9 5 1 Health Care Sciences Services 32 8 4 9 1 1 9 1 1 Automation Control Systems 10 6 2 2 3 1 2 3 1 Medical Informatics 24 6 5 2 1 1 2 1 1 Mathematics 18 1 1 3 1 2 3 1 1 Environmental Sciences Ecology 15 6 6 1 6 2 1 6 2 Neurosciences Neurology 13 1 1 3 1 1 3 1 1 Remote Sensing 11 1 1 3 2 1 3 2 1 Mathematical Computational Biology 10 2 2 1 1 1 1 1 1 Communication 8 1 1 2 1 1 2 1 1 Biotechnology Applied Microbiology 8 1 1 2 1 1 2 1 1 Agriculture 6 2 2 2 1 1 2 1 1 2.2. Study selection process and data analysis Thereviewarticleselectionprocessisbasedon“Query Extracting Process”. The queries are numbered as Q1, Q2, and Q3 by using a combination of various keywords. Table 1showssomeexamplesoftheexecutedqueries. Inthe ﬁrst scenario, 1027 research articles are selected on the basis of their topic-wise execution of “Q1”, which includes 867 articles, 84 editorial material, 66 reviews, 26 proceeding papers, 3 book reviews, 7 book chapters, and 1 meeting abstract. On execution “Q2”, a total of 272 research articles are listed that contains 222 as articles, 30 reviews, 19 editorial material, 1 letter, 2 book chapters, and 7 proceeding papers. Furthermore, 105 research articles obtained on execution of “Q3”,amongwhich,thereare87articles,11reviews,4editorialmaterial,3meeting,and3proceedingpapers. In the second scenario, the aforementioned queries are searched on a title-wise. As a result, 239 research articles are selected onexecutionof“Q1”, which contains 159 articles, 54 editorial material, 13 reviews, 2 proceeding papers, 3 book reviews, 1 book chapter, and 10 meeting abstract. Further, 120 research articles are listed on the execution of “Q2”thatconsistsof66articles,6reviews,43editorialmaterial,1letter,4meetingabstract,and1proceedingpapers.The35researcharticlesarelistedonexecuting “Q3”, which includes 18 articles, 6 reviews, 6 editorial, 3 meeting, 1 correction and 1 proceeding papers. Finally, the topic-wise andtitle-wise queries are combinedand executedsimultaneouslywhich results into 90, 40 and 20 papers on the execution of Q1, Q2, andQ3respectively.Afterthemanualcleaningofthedata,70primarystudypaperswereobtained. Table3 Top16JournalspublishingarticlesofBigDataAnalytics. Journal Topic Title Topicandtitle Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 Big data 32 6 13 3 1 1 8 1 1 IEEE Access 18 2 3 8 1 1 3 2 1 IBM Journal of Research And Development 17 5 1 3 1 1 3 2 1 PLOS one 11 1 1 3 1 1 2 1 1 Decision Support Systems 11 6 1 2 1 1 4 2 1 Big Data Research 11 4 1 4 2 2 4 2 1 IEEE Network 9 2 1 4 1 1 5 1 1 Computer 9 1 1 5 1 1 1 1 1 Information System 4 1 1 1 1 1 1 1 1 Future Generation Computer Systems The International Journal of Grid Computing And Science 5 2 1 1 1 1 3 1 1 Communications of The Acm 5 1 1 3 1 1 2 1 1 Cluster Computing The Journal of Networks Software Tools And Applications 5 1 1 1 1 1 1 1 1 Expert Systems With Applications 4 1 1 1 1 1 1 1 1 Knowledge And Information Systems 3 1 2 1 1 1 1 1 1 IEEE Communication & Survey Tutorial 1 1 1 1 1 1 1 1 1 Fig.2.DistributionofarticlesbyyearofpublicationsinWoS. 2.3. Result related work The summary is presented in Table 2, Table 3 and Fig. 2 and Fig. 3. The Table 2 indicates the number of publications by top 18 research area, the Table 3 indicates the list of the number of publications by top 16 journals, the Fig. 25 indicates the number of publications per year to justify trends of BDA (2000- 2016), and the Fig. 3 indicates the percentage of publications in the area of BDA.Further, it represents the topic-wise, title-wise, and both in topic-wise and title-wise percentage of publications in the area of BDA,BDA-M,BDA-ML. The Fig. 2 shows the no of publications of BDA (537 and 110), BDA-Management (146 and 73) and BDA-Machine Learning (63and20)onthebasisoftopic-wiseandtitle-wisequeries.Thetopic-wise(TS)queriesQ1,Q2,Q3arerepresentedwithRed,Blue,andGreencolouredlinesrespectively,whilethetitle-wise(TI)queriesQ1,Q2,Q3arerepresentedasMagenta,YellowandCyancolouredlinesrespectively. 2.4. Past, present, and future of big data analytics In 1997, the terminology of “big data” was ﬁrstly described by (Cox & Ellsworth, 1997). The visualization provides an interesting challenge for computer systems as the: - data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. The popular press McKinsey and Company leveraged resources to document a 5–6% increase in global productivity fromdata-drivenanalytics,overthenon-bigdata-friendlycompany.InternationalDataCorporation(IDC)foundthatthecreatedand copieddatavolumeintheworldwas1.8zettabytes(ZB).IBMindicatesthateveryday2.5EBofdataiscreated.CISCOpredictedthat,by 2020, 50 billion devices will be connected to networks and to the Internet. Fig. 4 shows the timeline of the Big Data processing paradigms and technologies (Nino & Illarramendi, 2015; Buyya, Calheiros, & Dastjerdi, 2016) The Big data resources present a great opportunity for digital business models, and can be seen with Google, eBay, Amazon, Fig.3.Publicationsintheareaofbigdataanalytics,management,andmachinelearning. Fig.4.Thetimelineofthebigdataprocessingandtechnologies. Facebook and Netﬂix, Borders and many other businesses (Mithas & Lucas, 2010). In 1999, Apache software foundation (ASF) was established. For Big Data, batch processing was introduced in 2003 and 2004, Google popularized its papers on Google File System and Map Reduce. The ﬁrst generation of Big Data was initiated in 2006 when Hadoop was born. Similarly, Apache Pig is originally developed and endorsed by Yahoo, Facebook began the development of an open-source tool for identical desire, Apache Hive. Yahoo! introduced Pig in 2008 and Facebook started Hive in 2009 (Casado & Younas, 2015). The Second generation presented the Apache storm for real-time processing. It was started by Nathan Marz and released as opensourcebyTwitter.OthercompanieslikeClouderaorLinkedinpresentedinterestingtechnologiessuchasFlumeandKafka.Theseopensourcedevelopmentshavedeﬁned an ecosystem of Big data tools around Apache Hadoop, together with other components such as Apache Spark, Mahout (machine learning), Sqoop (data transferring between Hadoop and other systems), Oozie (job scheduling and monitoring on Hadoop), Zookeeper (distributed process conﬁgurationandcoordination),andApacheGiraph(toprocessdatastoresasgraphs)in2014. Since1997,manycharacteristicsaddedtoBigData.Theﬁrst 3-V volume, variety, and velocity characteristics have been fa- miliarized by (Gartner, 2011), the fourth V, Veracity has been included by Dwaine Snow in his blog named “Dwaine Snow's Thoughts on Databases and Data Management” in 2012. The ﬁrst 3Vs: volume, velocity, variety (Chen & Zhang, 2014),4Vs:volume,velocity, variety, and veracity (Abbasi, Sarker, & Chiang, 2016; Zikopoulos & Eaton, 2011) are described. Both variety and velocity are essentially working beside the veracity of the data. These V's decrease the capacity to cleanse the data before analyzing it and making useful insights. The 5V's are volume, velocity, variety, veracity, and value (Oracle, 2012), the ﬁfth V introduced by (Gamble & Goble, 2011) refers to worthwhile and valuable data for business. The 7V's: volume, velocity, variety, veracity, value, variability, and visualization (Seddon & Currie, 2017). Variability and complexity are two other facts speciﬁcallyforanalyticalareas. RQ1: What are the most important seven characteristics of Big Data Analytics? Someofthetechnicalchallengeshavebeenassociatedtodiﬀerent “V” characteristics, in particular “Volume” (support of very high data volumes), “Velocity” (fast analysis of data streams), “Variety” (support for diverse kinds of data), “Veracity” (support for high data quality), “Value” (the value of the insights and beneﬁts), “Variability” (support for constantly changing), and “Valence”(supportofconnectivityindata). The seven characteristics of BDA include some exploration of diﬀerent steps and processes of data analytics. These seven aspects represent diﬀerent diﬃculties in analyzing big data. Our major aim is to provide a comprehensive picture of each characteristic and also describes their challenges. These seven characteristics of BDA are shown in Table 4 and further explained as follows (Sivarajah, Kamal, Irani, & Weerakkody, 2017): Currently, Big Data Analytics has become a trendy practice in business intelligence that consists massive amount of dataset andadvancedanalytictechniques.Villars, Olofson, and Eastwood (2011) stated that business and organizations can “extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and analysis”. Kambatla, Kollias, Kumar, and Grama (2014) presented a literature survey on Big Data Analytics. Assuncao, Calheiros, Bianchi, Netto, and Buyya (2015) stated that cloud computing plays a key role for Big Data because it can act as a business model to follow popular terms e.g. Analytics as a Service (AaaS) or Big Data as a Service (BDaaS). Zhang and Xiang (2015) discussed that BD integration, data quality issues, privacy and analytics can be used for eﬀectivebusinessdecision. The paper by (Chen, Kazman, & Haziyev, 2016) introduced an architecture-centric approach, called Architecture-centric Agile Big data Analytics (AABA). Its purpose is to address technical and organizational challenges in big data system development and agile delivery of big data analytics for web-based systems. Fogelman-Soulié and Lu (2016) presented an application of Big Data Analytics inbusiness(e.g.credit-cardfrauddetection).TheframeworkdevelopedinthisstudyshowedthathowcompaniescanstoretheirBigDatainadatalakeiftheywanttoimplementmanyBigDataprojects. Thepaper by (Ahmed et al., 2017) explored the recent advances and key requirements for managing BDA on the Internet of Things (IoT) environment. Bashir and Gill (2016) proposedanIoTbigdataanalyticsframeworktoovercomethechallengesofstoring Table4 TheSevenV'scharacteristicsofBigDataAnalytics. Name Description Examples Challenges Volume (Barnaghi, Sheth, & Henson, 2013) Volume of big data is explained in terms of its size and exponentialgrowth. Large-scale and the sheer volume of data is a big challenge.Itisknownassize. Applications: -Medicaldata,Socialmedia Scaleofdata: -Terabyte -Petabytes -Exabyte -Yotabyte -DataStorage -Dataacquisition -Processingofdata -Performance -Cost Variety (Chen et al., 2013) It refers to the complexity of large data set which may be semi- structured,unstructuredorstructured.Itisknownascomplexity.Applications: Weather data, DNA Sequencing, Biology Diﬀerent forms of data: -Text,documents -Images,voice, audio,video -Geo-spatial data -Networkdata -Sensorsdata -Heterogeneity of data -Diverse -Dissimilarforms Velocity (Sivarajah et al., 2017) Veracity (Vasarhelyi et al., 2015) Valence(Sivarajahetal.,2017) Value (Sivarajah et al., 2017) Variability (Sivarajah et al., 2017) It is a high rate of data inﬂow with non-homogenous structure. It is known as speed. Applications: Financialmarket,adagencies Veracityfeaturemeasurestheaccuracyofdataanditspotentialuseforanalysis.Itisknownasquality. Itreferstotheconnectivityofbigdataintheformofgraphs.ItisknownasConnectedness. Applications: Healthcaredata BigData=Data+Value? Itistheheartofthedatachallenge.Itextractsknowledgeablevaluefromvastamountsofstructuredandunstructureddatawithoutloss,forendusers. Applications: Business or industries Itreferstodatawhosemeaningischangedconstantlyandrapidly.Itremainsaconstantchallenge. Application: Stockmarket,ﬁnancedata Analysis of streaming data: -Batchprocessing -Real-time processing -StreamingprocessingUncertaintyofdata: -increasinglycomplexdatastructure, -inconsistency in largedata-sets MeasureofConnectivity: -DataConnectivity SevenV's: -Size -Complexity -Quality -Connectedness -Speed -Variations -Value(important)Variationindataﬂowrates -Complexity -Slow and expensive nature of data processing -Accuracyofdata -Reliabilityof the datasources -Context within Analysis -inaccuracy,latency,subjectivity -Morecomplexdataexplorationalgorithm. -Modelingandpredictionofvalencechanges. -Group event detection. -Emergentbehavioranalysis -Increaserevenue -Decreaseoperationalcosts -ServeCustomers -Inconsistencyofdata -Peak-levelcomputingDemand -Periodic peaks and Troughs andanalyzingalargeamountofdataoriginatingfromsmartbuildings.Rathore, Ahmad, and Paul (2016) proposed a smart city management system based on IoT that exploits big data and analytics. Sezer, Dogdu, Ozbayoglu, and Onal (2016)proposedanaugmentedframeworkthatintegratessemanticwebtechnologies,bigdata,andIoT. For the processingand analysisof Big Data,various recently usedplatformsare investigated for large amountof IoT generateddata as follows:(i) enablingcapabilityfor storing & processinglargeamount of data(Apache Hadoop, 2011), (ii) enabling capability for advanced data analytics: extraction, transfer and loading (ETL) (1010data), (iii) enabling capability of big data IoT processing and analytics (SAP-Hana, 2013), (iv) enabling capability that support for Hadoop in order to big data processing and analysis (Cloudera, 2008), (v) enabling capability for parallel processing, analysis and security for unstructured data (HP-HAVEn, 2013), (vi) enabling capability for Hadoop based processing and analysis on large amount of data (Hortonworks, 2011), (vii) enabling capability for analytical database that combine massively parallel processing (MPP) petabyte scale volume data (Pivotal big data suite, 2016), (viii) enabling capability for data analyze and management problem solving up to 50 terabyte (Infobright, 2005), (ix) enabling capability for fast processing, analyzing, and predictive capabilities (MapReduce, 2008). Further, the structures of the top primary studies are classiﬁed. The structure for classiﬁcation is based on the method which was proposed by (Jabbour, 2013). The classiﬁcation scheme includes six categories: - namely study, objective, focus, capabilities, beneﬁts, and their results as shown in Table 5. • Study: It consists conceptual, theoretical, empirical, literature review, and case study. • Objective: Various objectives of BDA, related review, and research. • Focus: Various researches focus on direction of BDA in diﬀerent domains of application. • Capabilities: It includes important data capabilities such as analytics, prediction, decision, and management. • Beneﬁts: Various beneﬁts and impact of BDA. Table5 Top20Primaryresearchescomposedthesample. Primary Study Objective Focus Capabilities Beneﬁts Result 1. (Pääkkönen & Pakkala, -Conceptual Survey and use case on BDA Referenced architecture on commercial -Analytical Commercial product & services for BDA A new perspective on 2015) -Theoretical applications product and services for BDA system system research 2. (Oussous, Benjelloun, -Conceptual Survey of BDA technologies and Various tools and technologies -Analytical BDA opportunities, application, challenges A new perspective on Lahcen, & Belfkih, -Theoretical algorithms -Decision and issues research 2017) 3. (Liu, Li, Li, & Wu, -Theoretical Survey of empirical studies on Big data errors in spatial information -Decision Reduced big data error in data collection A new perspective on 2016) -Empirical data quality & data usage of BDA science processing and analysis research -Case Study 4. (Zhang & Xiang, 2015) -Conceptual Survey of BDA data quality Data quality solutions for business -Analytical Increase the data privacy, security, quality Consistent with previously -Theoretical organization issues published literature 5. (Mikalef, Pappas, -Conceptual Literature survey on BDA Resource base theory (RBT) and -Analytical Theoretical framework on business value and Research in domain of Krogstie, & -Theoretical capabilities of BDA -Decision competitive advantage for BDA application knowledge to gain insights Giannakos, 2017) -Literature -Management through analysis Review 6. (Yaqoob et al., 2016) -Theoretical Survey on BDA processing, Usage in many multidisciplinary -Analytical Increase in productivity of industries/ Research in domain of -Case study technologies & organization case application -Predictive companies and provide consumer density of knowledge to gain insights study -Decision the ﬁrm with BDA through analysis 7. (Zhou et al., 2016) -Conceptual Systematic review of BDA for Industrial development of big data -Decision Energy Eﬃcient big data- driven optimization Research in domain of -Theoretical smart energy management. driven smart energy management -Management & real-time monitoring and forecasting knowledge to gain insights through analytics 8. (Tu et al., 2017) -Conceptual Survey on smart grid integration Empirical studies on smart grid and -Analytical Stability & Reliability Utilization & Eﬃciency A new perspective on -Theoretical of BD management and BDA energy big data analytics -Decision Better Customer Satisfaction research algorithm -Management 9. (Kshetri, 2016) -Modeling Survey on role of big data in Analytics in Financial companies -Analytical The use of BDA helps to overcome the A new perspective on -Literature facilitating the access to ﬁnancial -Decision reducing information opacity and transaction research review services in china -Management costs 10. (Cerchiello & Giudici, -Modeling Systemic risk model based on big Financial Risk management/ markets -Predictive Understanding of Financial services Replication to a diﬀerent 2016) -Literature data and tweets -Management context or period review 11. (Yang, Zhong, Liu, & -Conceptual Theoretical framework of Improve Big data storage mode. -Analytical Eliminate data noise and to remove data Consistent with previously Feng, 2014). -Theoretical ﬁnancial data classiﬁcation redundancy published literature standard 12. (Sun, Chen, & Yu, -Modeling Generalized optimal wavelet Big Financial and ﬁnancial analytics -Analytical FA provide better understand the viability, Research in domain of 2015) -Empirical decomposing algorithm for big -Predictive stability, and proﬁtability of business/ knowledge to gain insights ﬁnancial data -Decision market/beneﬁcial decisions through analytics -Management (continued on next page) Table 5 (continued) Primary Study Objective Focus Capabilities Beneﬁts Result 13. (Crawley & Wahlen, -Literature Survey on analytics in empirical/ new analytics for testing hypotheses -Analytical Informed business professionals about Consistent with previously 2014) Review achieving ﬁnancial accounting questionnaires analytics in accounting ﬁnancial accounting published literature 14. (Tian, Han, Wang, Lu, -Empirical System architecture for Big data Analysis on the critical latency analytics -Management Useful for the banking and ﬁnancial Comparative research & Zhan, 2015) analytics requirement in ﬁnance using BDA organizations 15. (Edwards & Taborda, -Theoretical Review on domain of analytics, To understand relationship between the -Analytics Understand the knowledge domain using data Replication to a diﬀerent 2016) risk management, and knowledge knowledge data, techniques, and -Decision analytic capabilities context or period management experience -Management 16. (Wu, Li, Cheng, & Lin, -Theoretical Healthcare-wearable technology Bring new opportunities for healthcare- -Analytics higher-quality ﬁrms, the optimal quality level A new perspective on 2016b) -Empirical optimize insights wearable device providers for health and biomedical sector research 17. (Cetin, Demirçiftçi, & -Theoretical Review of revenue management Hotel revenue managers with KSAs -Decision Helps to understand the revenue management Research in domain of Bilgihan, 2016) challenges required in managing inventory and -Management challenges knowledge to gain insights prices through analytics 18. (Addo-Tenkorang and -Theoretical Review on ‘‘big data,” its Big data applications attempting to -Analytics The four main attributes or factors identiﬁed Comparative research Helo, 2016) -Literature Review applicationandanalysisof operations or supply-chain identifyandunderstandthechallenges in industrial or supply chain -Decision -Management with ‘‘big data” Variety, Velocity, Volume, Veracity, and Value-adding management 19. (Hazen, Skipper, -Theoretical Review of big data and predictive Focus on eight theory-driven impact of -Analytics Identifying development of BDA Prediction A new perspective on Ezell, & Boone, 2016) analytics BDPA's on supply chain management -Management and modern competitive upon ﬁrm research performance 20. (urRehman et al., -Theoretical Review of the big data analytics Proposed knowledge-driven based big -Analytics It enables local knowledge availability, Research in domain of 2016) -Case Study process and popular relevant tools data reduction framework for value -Management privacy preservation, and secure data sharing knowledge to gain insights for value-creation creation functions to build trust between customers through analytics andenterprises. Fig.5.Review structure of thepaper. • Result:Itshowstheusefulvalueinsightsasinresultfromtheseprimarystudyarticles. The studies primarily focused on diﬀerent applications areas of BDA. Various researchers proposed framework for Big Data and Product Lifecycle Management (BDA-PLM) (Zhang, Ren, Liu, Sakao, & Huisingh, 2017), the big scholarly data lifecycle (Assuncao et al., 2015; Khan, Liu, Shakil, & Alam, 2017), 3-As Data Quality-in-Use model for data quality characteristics for use in Big Data projects (Merino, Caballero, Rivas, Serrano, & Piattini, 2016), the Uniﬁed Technology Acceptance and Usage Theory (UTAUT) aligned with the idea of Big Data as a Service (Shin, 2016), novel conceptual basis Operational business intelligence (OpBI) systems designed with value-based business requirements (Hänel & Felden, 2015), uniﬁed and dynamic framework analysis of big data business values and managerial, operational, organizational changes led by data-driven approach (Sheng, Amankwah-Amoah, & Wang, 2017), conceptual model of the seven V′s of big data analytics to gain a deeper understanding of the strategies and practices of high-frequency trading (HFT) in ﬁnancial markets (Seddon & Currie, 2017). Bigdataanalytics&decision-makingframework(BDA-DMF) The framework of Big Data Analytics and Decision-Making Framework (BDA-DMF) is shown in Fig. 5 to discover value in the business ecosystem. This ﬁgure indicates the big data management, big data analytics, data visualization, and decision-making for value-creation that are discussed in Sections 4, 5, 6, and 7respectively. RQ2: How to design BDA-DM framework? Big data analytics is a data-intensive architecture that provides various technologies and platforms used in various phases such asdatageneration,dataacquisition,datastorage,advanceddataanalytics,visualizationanddecision-makingforvalue-creationasshowninFig. 7.Itfollowsatop-downapproach.Itconsistsvarioustechniquesandtechnologiesi.e.Hadoop,HBase,Cassandra, Fig. 6. Diﬀerenttypesofdatadomain. Fig.7.Architectureofbigdataanalytics. MongoDB,NoSQLandsoon.Aslimitation,thesetoolsandtechniquescannotsolvetherealwordproblemsofdatastoring,datasearching,datasharing,datavisualization,andalsoreal-timeanalysis. Bigdatamanagement Bigdatamanagement(BDM)providesaninfrastructuretoBigDataAnalytics,where datamanagementtechniques,tools,andplatformsincludingstorage,pre-processing,processingandsecuritycanbeapplied(Bilal et al., 2016; Siddiqa et al., 2016).ThecomponentsinvolvedinBDMaredescribedas:- 4.1. Data sources Big data generation refers to generation of data from various relevant sources. It can be generated by humans, machines, businessprocesses,anddatatechniquesthataredescriptive,predictive,andprescriptive. 4.1.1. Data domain A ﬂourishing domain of data is expressed by variety of descriptive terms such as:-structured, unstructured, machine and sensor- generated data, batch, and real-time processing data, biometric data, human-generated data, and business-generated data. The Fig. 6showstherelevanceforvariousgenerationsofbigdataanalyticsdomains Machine-Generated Data:Themachine-generateddatacomesfromseveralcomputernetworks,sensors,satellite,audio,videostreaming,mobilephoneapplications,andpredictionofsecuritybreaches. Human-Generated Data: It can be collected by people, for example: identiﬁcation details having their name, address, age, oc- cupation, salary, qualiﬁcation etc. Whereas, real streaming data can be generated by various ﬁles, documents, log ﬁles,research,emails,andsocialmediawebsitessuchasFacebook,Twitter,YouTube,LinkedIn. Business-Generated Data: The volume of business data of all companies across worldwide is estimated to double every 1.2 years such as transactional data, corporate data, and government agencies data. When Business intelligence (BI) of BDA is discussed, it means: value (does the data contain any valuable information for my business needs?), visibility (focus of insight and foresight of a problem and an adequate solution associated with it) and verdict (potential for decision-makers based on problem, computa- tional capacity and resources) within the business intelligent domain (Wu, Buyya, & Ramamohanarao, 2016a). 4.1.2. Data types Following are the three types of analytics that organizations and industries can use to learn and get the insights to promote theirbusiness. Descriptive: It is composed of various technologies and summaries of inferred data that represent current and previous happening process. Standard reporting, ad-hoc reporting, dashboards, querying, and drilling down are the various examples of descriptive analytics. It is deﬁned as look into past in order to draw some inferences.-“What has happened?” Predictive: The predictive analytic modelings are root-cause analysis, Monte Carlo simulations, and data mining. It is sometimes used in real-time or in batch-time processes. Siegal (2010) illustrated that seven sequential objectives are organized by adopting these predictive analytics namely compete, grow, enforce, improve, satisfy, learn, and act. It predicts future trends.-“What could happen”? Prescriptive: This technique is applicable to future scenario and advises a solution or insightful actions from the predictions. Basu (2013) represented the ﬁve pillars of prescriptive analytics namely: hybrid data, integrated predictions and prescriptions, pre- scriptions and side eﬀects, adaptive algorithm, and feedback mechanism.-“What should we do?” 4.2. Data acquisition Here,dataacquisitioncoversabroadspectrumofcollecting,ﬁltering and cleaning process of data ingesting in a data warehouse or any other databases. (Chen, Mao, & Liu, 2014)investigatedthatdataacquisitionsupportsheterogeneityduetoavarietyofdevices. 4.2.1. Data collection It is a processto acquirethe unprocesseddatafrom real-world environment,and develop it proﬁciently. Log ﬁles are widely used to expand data collection that is generated by multiple sources and all applications working on electronics devices such as extended log format (W3C), common log ﬁleformat(NCSA)andIISlogformat(Microsoft). Sensors are another substitute that measures a physical quantity and transfers it in readable form by digital signals. There existseveral types of sensors such as audible, sound, automotive, vibrate, electric current, weather, thermal, pressure transferred throughwired or wireless networks. Web crawlers are generally used to collect data or applications from various website based processes suchas(websearchenginesorwebcaches)(Castillo, 2005). 4.2.2. Data staging Further,itisdeﬁned as a process for the collecting of wide variety of data sets along with noisy, redundant, and consistent data. It is divided into two alternative models namely: - the streaming processing models and batch-processing models. The streaming processing model analyzes the data as soon as possible to derive its results where the data arrives in continuous form at very fast speed. To support it, there are some open source systems that include Storm, S4, and Kafka (Hu, Wen, Chua, & Li, 2014). In the batch-processing model, data is ﬁrst stored and then analyzed. In this model, MapReduce (Dean & Ghemawat, 2008) has become the dominant platform. Fig. 8 shows(a)thedatastagingintotwopartsofdataexplorationanddatapre-processingformsand (b)thepredictivemodel. Data Exploration: There are two main aims of data exploration. Firstly, to determine and understand nature as well as char- acteristics of data. Secondly, to determine the data quality issues that can badly aﬀect the model. Data exploration and data mining are widely used to discover new insights. For example: - data quality report (mean, mode, median, and range; standard deviation and percentiles; bar plots, histograms and boxplot)anddataqualityissues(validorinvalid). Pre-Processing: To extract the meaningful information from the big data, it is necessary to clean, integrate and transform the data (Hu et al., 2014)throughvarioustoolsnamelyApacheHadoop,NoSQL,andMapReduce.Pre-processingisrelatedtoseriesofstepsnamelyhowtointegratedata,howtotransformdata,howtoselecttherightmodelforanalysisandhowtoprovidetheresults. Fig.8.(a)Datastagingand(b)Thepredictivemodel. Fig.9.ThePlatformofvariousbigdatastorage. -Cleaning: It is an essential goal of pre-processing which clean, address the data quality and format because of its messy nature. Itenablesustodiscoverimprecise,insuﬃcient,orimmoderatedatathatrequiresaltering,removingandimprovingdataquality. -Integration: With the use of extract, transform and load (ETL) process, the data can be cleaned, well transformed and made itapplicabletodataminingandvariousonlineanalytics. -Transformation:Thetransformationof therawdataistomakeitsuitablefor analyzingandgettingdataintoshapesuchasintegratingandpackagingofdatausingsometools:ETL,DMT,andPig.Therearevariousactionsthatcanbeappliedinthereal-time format of data such as splitting of data, merging it, performing computations, connecting it with the outside data domain andspreadingdatatomultipledestinations. 4.3. Data storage & processing It is the process of managing data storage. It performs activities in parallel to optimize the storage process. Data clustering,replication, and indexing are adequate activities that are signiﬁcant to accomplishing the storage phase in big data management (Siddiqa et al., 2016). It refers to how numerous types of data can be stored in diﬀerent forms after collecting them from diﬀerent sources. There are various useful tools for big data storage namely Hbase, NoSQL, Gluster, HDFS, and GFS (Gandomi & Haider, 2015; Pole & Gera, 2016). (Cheptsov & Koller, 2015) introduced an innovative approach to parallelism data-centric based application on the message passing interface. The Fig. 9 describes the big data storage for various platforms (Hu et al., 2014;Chen,2014). Fig. 10. Classiﬁcation of diﬀerentdataanalyticstechniques. Bigdataanalytics AdvancedBigDataAnalyticprocessreferstoanalyzeheterogeneousdataandmineinsightfulinformationthroughunknownpatterns by applying various predictive algorithms, semantic analysis, statistical analysis methods, and technologies. Collection andtransportation ofbig datashare acommon goal:- analyzingthedata for insightsandbetterapplication guidance(Li & Jain, 2013). (Fahad et al., 2014) described few eﬃcient algorithms such as sampling, data condensation approaches, density-based approach, grid-based approach, divide and conquer, incremental learning and distributed computing. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996) presented the steps that composed with knowledge discovery in database process. They deﬁned signiﬁcant iterations such asselectionofdata,pre-processingofdata,thetransformationofdata,dataminingalgorithmsapplyingtoenumeratepatternsforproperinterpretationofresultsandtoensureusefulknowledgediscoveryfromdata. Tsai, Lai, Chao, and &Vasilakos (2015) presented big data analytics of various infrastructures that are categorized in the manner (i) Processing or Computing: Hadoop, Nvidia CUDA, or Twitter storm, (ii) Storing: Titan or HDFS, and (iii) Analyzing: MLPACK ormahout. There are some other tools such as Whiteboard, R, MATLAB, octave refer for (kilobyte to low megabyte); Numpy, Scipy,Weka,Blasreferfor(megabytetolowgigabyte);andHive,Mahout,Harna,Giraphreferfor(gigabytestoterabyte). 5.1. BDA techniques The recent advancements in techniques and technologies have enabled many enterprises to handle big data eﬃciently. The data analytics techniques are machine learning, data mining, statistics, artiﬁcial neural network, extreme machine learning, natural language processing, and deep learning etc. The Fig. 11 shows the origin of BDA techniques. BDA has led to numerous technologies to perform an analytics. Overview of Big Data Analytics Machine learning tools are described in Appendix A. RQ3: What are the main tools, techniques, and technologies of Big Data Analytics? 5.1.1. Advanced machine learning Advancedmachinelearning(ML)analyticisanumbrellaactionthatdeﬁnes the selection of analytical technique to build a model for evaluation of an eﬃcient result. By tradition, machine-learning research is divided into two categories: logical representations and statistical ones. Initially, it selects an input data technique to build a predictive model and generate model output or validate. The Fig. 8(b)showsthepredictivemodelforactivityiterativeprocessincludingbuild,explore,scale,report,andact. Themostcommonpredictiveanalyzingtechniquesthatareusedforadvanceddataanalyticsuchasclassiﬁcation, clustering, regression, association analyzes, graph analyzes and decision tree. The predictive data analytic applications are supervised ML and unsupervised ML algorithms. The supervised ML methods are self-learning models that represent relationship between a set of descriptive and a target feature based on historical examples. However, in supervised machine learning, the ﬁrst one category is regression which includes linear regression, generalized linear model, ensemble methods, decision trees, neural networks. The Fig. 10 shows the classiﬁcation of diﬀerentanalyticdatatechniques. Classiﬁcation:Topredictthecategoriesofinputdatafore.g.weatherattributesaresunny,windy,rainyetc. Regression:Topredictnumericvaluee.g.priceofstocks. Clustering:Toorganizesimilaritemsin-togroupse.g.groupingacompanyinsenior,adults,andteenagers. Association Analyzes: To ﬁndinterestingrelationshipsbetweensetsofvariables. Graph Analyzes: To use graphic structure to ﬁndconnectionsbetweenentities. Decision Tree:Topredictmodelinginsightsofobjectivevariablesbylearningsimpledecisionrulesinferredfromthedatafea-tures. Further,itconsistsclassiﬁcation algorithms such as support vector machines, discriminate analysis, naive Bayes, and nearest neighbour. Unsupervised machine learning uses clustering techniques which include various models like as k-means clustering, k- medoids, fuzzy c-means, hierarchical, Gaussian mixture, neural networks,and hiddenMarkov model. There are used in various real-timeapplicationssuchasmedicaldiagnosis,stocktrading,energyloadforecasting,weatherforecastingetc. 5.1.2. Advanced statistics Advanced statistics analytics is primarily based on various tools and techniques for collecting, analyzing and visualizing the resultfromthelargescaleofdata.Itincludesdiﬀerentdomainofanalyticsthatderivestechniquesfromstatisticsanddata-drivenanalysisthatexecutesstatisticsalgorithm.Thestatisticaltechniquereferstoclusteredanalytics,dataminingandpredictivemodelingmethods. 5.1.3. Advanced data mining The BD mining is the most challenging technique as compared to traditional data mining such as pattern discover and extraction.Data mining depends on techniques such as data statistics, machine learning methods and pattern recognition (Chen & Zhang, 2014). Multiple linear regression and logistic regression are also commonly used in data mining, which includes various algorithms such as k-means clustering, association analysis, and decision trees. Overview of big data analytics techniques and their applications area shown in Table 6. 5.2. Big data analytics & applications Therearemanytechniquesthatcanbeusedtoanalyzebigdata.ThisworkpresentsvariousanalytictechnologiesareasinwhichBDAisapplicableasfollows. 5.2.1. Social analytics Socialanalyticsisanimportantandgrowinganalyticsofreal-timedataanalytics.Itiscategorizedintosocialnetworks(e.g.,Facebook and LinkedIn),blogs (e.g.,BloggerandWord Press), micro blogs(e.g., TwitterandTumblr), socialnews (e.g.,Digg andReddit),socialbookmarking(e.g.,DeliciousandStumbleUpon),mediasharing(e.g.,InstagramandYouTube),wikis(e.g.,WikipediaandWikihow),question-and-answersites(e.g.,Yahoo!AnswersandAsk.com)andreviewsites(e.g.,Yelp,TripAdvisor)generalsites(Li, Chen, Wang, & Zhang, 2013)likeFacebook,Instagram,Foursquare,Twitter,andPinterest,whichproduceimmenseamountsunstructuredformofdata. 5.2.2. Mobile analytics Personal mobile devices can be used as instruments to collect and monitor learning analytics towards self-regulation. It hasdiscovered existing unknown meaningful patterns and knowledgeable data from a few dozen terabytes to numerous petabytescomposed from mobile users at the network-level or the application-level (Yazti & Krishnaswamy, D. Z. 2014). There are some studies about mobile and ubiquitous learning analytics tools (Alsheikh, Niyato, Lin, Tan, & Han, 2016; Fulantelli, Taibi, & Arrigo, 2013)presentedascalableApacheSpark-basedframeworkfordeeplearninginmobilebigdataanalytics. 5.2.3. Living analytics It is associated with the study of social and behavioral forms of individuals and societal groups. The domain of analytical socialscience is integrally using advances in storage and computing abilities to process readily in big data (Lazer et al., 2009). Severalcommon challenges of living analytics with big data include high volume, high velocity, high dimensionality, sparse data, and avarietyofdiversedatasourcesandformats,etc. 5.2.4. Video and visual analytics Videoanalyticsistheresearchﬁeld that addresses the scalable and reliable analysis of video data. The visual analytics is described as ‘‘the science of analytical reasoning facilitated by interactive visual interfaces”anditsgeneralgoalistogenerateinsightfromdata.Itisafascinatingbranchofbigdatainvestigationtoprovideanalyticalreasoningovercollaborativevisualinterfaces. 5.2.5. Text analytics Itreferstotechniquesthatcanextractinformationfromtextualdata.Itcontainsstatistical,computationallinguistics,andma-chinelearning(Gandomi & Haider, 2015).Textanalyticsassistbusinessestoadaptlargevolumesofhuman-generatedtextintomeaningfulinsights,whichsupportsevidence-baseddecision-making.Broadlyspeaking,summarizationfollowstwoapproaches:the Table6 OverviewofBigDataAnalyticstechniquesandtheirapplicationarea. Name Reviewpapers/Title Reference Applicationarea Reference Machine learning Strategies and principles of distributed machine (Xing, Ho, Xie, & Wei, 2016) -Analyzing social networks Interpreting (Airoldi, Blei, Fienberg, & Xing, 2008), (Chandola, Banerjee, & learning on big data. texts, images, and videos Kumar, 2009), (Lee & Xing, 2012), (Zhao & Xing, 2014) Machine Learning on Big Data: Opportunities and (Zhou, Pan, Wang, & Vasilakos, -Identifying disease and treatment Challenges. 2017) paths A survey of machine learning for big data (Qiu, Wu, Ding, Xu, & Feng, -Tracking anomalous activity for cyber- processing. 2016) security Extreme machine Trends in extreme learning machines: A review. (Huang, Huang, Song, & You, -Computer vision (He et al., 2014) learning 2015) Extreme learning machines: a survey. (Huang, Wang, & Lan, 2011) -Image processing (An & Bhanu, 2012) Extreme learning machine: algorithm, theory and (Ding, Zhao, Zhang, Xu, & Nie, -System modeling and prediction (Tian & Mao, 2010) applications. 2015) medical/biomedical application (You, Lei, Zhu, Xia, & Wang, 2013) -Time series analysis (Butcher, Verstraeten, Schrauwen, Day, & Haycock, 2013) Artiﬁcial Neural Artiﬁcial neural network learning: A comparative (Sovilj, Sorjamaa, Yu, Miche, & -Chemical engineering (Zhang, 2000) Network review. &Severin, 2010) -Cancer prediction (Himmelblau, 2000) Artiﬁcial neural networks in business: Two (Neocleous & Schizas, 2002) -Disease Prediction (Agrawal & Agrawal, 2015) decadesofresearch. Artiﬁcial neural networks and its applications. (Tkáč & Verner, 2016). -Agriculture (Weng, Huang, & Han, 2016) (Francik et al., 2016) Neural networks for classiﬁcation: a survey. (Jha, 2007). Data Mining Educational data mining: A survey and a data (Peña-Ayala, 2014) -Educational data mining (Chaturvedi & Ezeife, 2012) mining-based analysis of recent works. -Business & Management (Baker & Yacef, 2009) Data mining techniques in social media: A survey. (Injadat, Salo, & Nassif, 2016) -Medical and Health (Moss, Corsar, & Piper, 2012) -Social Networks (Alowibdi, Buy, Philip, & Stenneth, 2014) Application of data mining techniques in (Ngai, Xiu, & Chau, 2009) -Wind energy systems (Soman, Zareipour, Malik, & Mandal, 2010) customer relationship management: A literature review and classiﬁcation. -Biomedicine (Phillips & Buchanan, 2001) Data mining techniques and applications – A decade review from 2000 to 2011. (Liao, Chu, & Hsiao, 2012) -Finance (Vavpetic, Novak, Grcar, Mozetic, & Lavrac, 2013) Data mining and wind power prediction: A (Colak, Sagiroglu, & literature review. Yesilbudak, 2012) Deep Learning Deep learning for visual understanding: A review. (Guo et al., 2016) -Image classiﬁcation (Krizhevsky, Sutskever, & Hinton, 2012) -Object detection (Hoﬀman et al., 2014) Deep learning applications and challenges in big (Najafabadi et al., 2015) -Image retrieval (Liu, Guo, Wu, & Lew, 2015) data analytics. -Semantic segmentation (Dong, Chen, Yan, & Yuille, 2014) A survey of deep neural network architectures and (Liu et al., 2017) -Human pose estimation (Ouyang, Chu, & Wang, 2014) their applications. -Speech recognition (Bengio, 2013) Natural Language Machine learning and natural language (Marquez, 2000) -Spelling and grammar checking (Zuker & Sankoﬀ, 1984) Processing processing. A tutorial on techniques and applications for (Hayes & Carbonell, 1983) -information retrieval (Saidi, Maddouri, & Nguifo, 2010) naturallanguageprocessing. extractive approach and the abstractive approach.In extractive approach, a summary is produced from the original text units. Incomparison,abstractiveapproachcontainsextractingsemanticinformationfromthetext. 5.2.6. Audio analytics Audio analyticsanalyze and extract data from unstructuredaudio datasuch as humanspoken language and it is referred to asspeechanalytics(Gandomi & Haider, 2015). The beneﬁts that can be achieved are summarized while using these techniques for speciﬁcapplicationareasofstorage,pre-processing,andanalysisetc. RQ4: What are the main application areas of Big Data Analytics? The various data analytics applications such as smart agriculture, smart healthcare, cyber-physical security, and smart cities arebrieﬂydescribedas. 5.2.7. Smart agriculture As the technology rapidly spread in few decades, big data analytics is the key to fostering a new revolution in agriculture. It hasevolvedtechnologytosolvereal-worldproblemsbasedonhistoricaldata,machine-generateddataandreal-timestreamingdata.AgriculturalIoT generatesa large volume ofagriculturalinformation (Lee, Hwang, & Yoe, 2013). Agriculture ﬁrms are adoptingbigdata technologies with a promise to gain insights from the large amounts of heterogeneous data, to solve the problem of real-time,managedataincompleteness,andlackofpriorknowledge,andcapturingavarietyofdatainacomplexform. Smart agriculture is beneﬁcial ‘use case’ in big IoT data analytics. Sensors are the actors in the smart agriculture ‘use case’. These areinstalledinﬁelds to obtain data on the moisture level of soil, trunk diameter of plants, micro-climate condition, and humidity level, as well as to forecast weather. It passes through an IoT gateway and the internet to reach the analytics layer (Marjani et al., 2017). The analytics layer processes the data obtained from the sensor network to issue commands. Automatic climate control of harvesting, timely controlled irrigation and humidity control for fungus prevention are the examples of actions performed on the basis of big data analytics (Gubbi, Buyya, Marusic, & Palaniswami, 2013). Kshetri, (2014) presented a case study of agriculture to get various beneﬁts, opportunities, and threats by implementing BDA and suggested about soil status to farmers, extreme variations in the weather patterns, new ways of planting, topography. It also provides information regarding variable market condition. Jiang, Chen, Dong, and Wang (2013) predicted the diﬃculties in sensors for storageand analysis by applyinga large amountof data. So putting forward a distributedstorage based on DSM architectureand combinedwithagriculturePaaSplatformtoprovideservice. Xie, Zhang, Sun, and Hao (2015) proposed a big data processing technology to obtain a hierarchy of agricultural information system from the following aspects: gathering, storing, analyzing, and visualization of agricultural big data. This paper described that how to deal with the ﬂood of agricultural data from the view of the big data technologies using Map Reduce Tool. The BDA provides a new insight to give advance decision support to improve yield productivity, and avoid unnecessary costs related to chemical ferti- lizers and pesticides. Bendre, Thool, and Thool (2016) presented the diﬀerentsourcesofbigdataandtypesinprecisionagriculture,ICT-basede-Agriculture. Finally,theydiscussedrainfall predictionapplication usingsupervisedandunsupervisedmethodfordataprocessingandforecasting. 5.2.8. Smart healthcare Big data analytics is an emerging revolution in healthcare and medical research for Research and Development (R&D), treatment,testing, and diagnosis for health management. As the healthcare associations are expanding day by day, because of increment in thequantityofpatients,thereisanexpansioninmedicationstobeutilizedfortheirrestorativetreatment.Inthisway,itmakeschal-lenges in storing, processing and analyzing. Hence, the demand of BDA is relevant in this ﬁeld also. The Health-care organizations are critically applying ‘wearable real-time sensors’toanalyzethecurrent conditionofthepatient andtreatthemaccording totheir correct diagnosis and provide medical treatment. Therefore,duringdiagnosisandtreatment,thereisavastcollectionofdatasuchas:-structuredandunstructureddata,self-monitoringhealthdata,real-timesensordevices,images,videos,variousreports,anddocuments.Presently,therearediﬀerent healthcare systems such as: - health-care management, innovation drug discovery, face recognition, veriﬁcation of signatures, ﬁn- gerprint, and iris. The Fig. 12 shows the process of analyzing unstructured data in health organization (Wang, Kung, & Byrd, 2016). The Big Data in health-care maintains an information regarding patient such as case history, physician notes, Lab reports, X-ray reports, diet rule, list of doctors, and nurses in a speciﬁc hospital, national health register data, medicine and surgical instruments expiry date identiﬁcationbasedonRFIDdata.TheseorganizationsarefurtherdependingonBDtechnologytocollectdatafroma patient to get more insights into care and treatment. Moreover,data-analyticscreatesadedicatedCenterforHealthAnalyticsandinsightstoaddresstheincreasingdemandfromhospitals, clinics, and health professionals across the world. The new big data health-care platforms: - CHESS (Batarseh & Latif, 2016), EHR, LIMS, MQIC, CMS (Ward, Marsolo, & Froehle, 2014). The Big Data analytic is used to analyze health insurance claims and leverage big data to detect fraud, waste, and error (Srinivasan & Arunasalam, 2013). Dolin, Rogers, and Jaﬀe (2015) presentedtwocasestudiestopredictasthmainclinicaldocumentarchitecture(CDA)byusingBDAapproach. 5.2.9. Cyber-physical systems The organization and government protect their sensitive information by using computer security networks. The Big Data is used tocollect,organize,andstorethedata.Aninformationtechnologyisappliedbycyberdefenderstoprotecttheirdataeﬃciently,detect Fig.12.Bigdataanalyticsinhealthcaresector. allmalware,andcyberattackers.Developersmustsynchronizeandmakehardwarecomponentcompatiblewithsoftwareapplica-tionsusingcomputernetworks,wiredorwirelesssensorsanddiﬀerentoperatingsystem,dataformatsandanalyticsystem. Hence,BDAplaysacriticalroletoovercometheseriousissuesaboutsecurity,privacyandthusauthenticatevariousorganizationstoaccessdata,gainacompleteinsightofbusiness.Theemergenceofcyber-physicalsystemscanbeusedforproduction,trans-portation,logistics,andothersectorstobringnewchallengesforsimulationandplanning,formonitoring,control,andinteractionwithmachineryordatausageapplications(Becker, 2016). 5.2.10. Smart cities Smart city is a wide concept, which takes into account not only the physical structure but also human and social aspects. It utilizesseveraltechnologiestoexpandtheperformanceofhealth,transportation,energy,education,andwaterservicesleadingtoanad-vancedlevelofcomfortoftheircitizens.TheapplicationofBDAiseﬀective for data storage and processing to generate information for diverse environments such as Smart grid environment (SGE) (Zhou, Fu, & Yang, 2016), Smart City (Ortiz-Rangel, M. 2015; Strohbach, Ziekow, Gazis, & Akiva, 2015). Smart healthcare is used to predict or diagnose the early disease (Demirkan, 2013; Roy, Pallapa, & Das, 2007). MostlythesurveypapersonBigDataAnalyticshavefocusedondiscussingtheopportunities,challenges,andarchitecture.Whereas,theconcepts,architecture,challenges,andnewfuturedirectionsofBigDataAnalyticsarebeingpresentedherewith.So,this study provides useful insights through the integration of various technologies used in the application of Big Data Analytics. DatasourcesandapplicationareasofbigdataanalyticsfordevelopmentareshowninTable 7. Visualization Abigdatavisualizationmethodisconcernedwiththedesignofagraphicalrepresentationintheformofatable,images,diagrams, and spontaneous display ways to understand the data. Visual analytics has potentially brought the new federation of datamining and machine learning tools. Visual perception, design, data quality, missing data, end-user visual analytics are future trends ofvisualization(Becker, 2016). There are various well-known visualization analytical tools such as Dive, Rattle, FlockDB, Orange (Pole & Gera, 2016), Flare, Amcharts, and Protovis. Recently, diﬀerent companies such as Amazon, Twitter, Apple, Facebook, and Google are searching visualization tools for solutions that can provide useful insights from various business aspects (Simon, 2014). The Fig. 13 shows the evolution of visualization methodology (Song, 2014). These tools and methods are appearing in form of charts, graphs, histogram, box plots, excel spread-sheets, heat maps, geographical maps etc. Interpretation is tackled with the presentation and visualization of inferences drawn in a comprehensible manner. Two main mechanisms are often used to interpret big data: - visualization and modeling. The use of big data has signiﬁcant implications for modeling and theory development from a statistical- scientiﬁc point of view (Ramannavar & Sidnal, 2016). BDA&decision-makingframeworkforvalue-creation InBigDataAnalyticsandDecision-MakingFramework(BDA-DMF)forvalue-creationmodel,theframeworkispresentedbywhichBDAcancreatevalueforﬁnancial & accounting companies. Framework for BDA and business insights for ﬁnancialaccounting Table7 Datasourcesandapplicationareasofbigdataanalyticsfordevelopment. Analytics Datatype Medium Applicationarea References Social analytics -Movie revenues -Websites, blogs -Sentiment analysis of social data (Asur, & Huberman, 2010) Mobile analytics -Call detail Records -Cell phones -Social Network Analysis, Population Mobility (Laurila et al., 2012) Patterns, Transportation System -Planning, Awareness Campaigns, Mobile App. -Usage Patterns, Mobile data, traﬃc Analysis Living analytics -Tweets and Comments -Social Media Sites -Social Network Analysis, Sentiment Analysis (Technical report, 2014) -Text -The Internet -Cultural Changes, Policy Eﬀectiveness -Personal Health Data -Wearable's -Healthcare Visual analytics -Images -Sensors -Weather Forecasting, Pollution Control, (Centro de, 2015) -Climate Variables, Temperature, -Camera Urban Planning Pollutant Levels Video analytics -Anonymous -Intel's Audience -Market Research, (Balkan & Kholod, 2015) Audience Data Impression Metrics (AIM) Suite, -Public Security System -Multimedia, Images -Camera -Automated security and surveillance Systems (Xu, Mei, Hu, & Liu, 2016) Text analytics -Text Data -Social network feeds, email, blogs, online forum, survey -Stock market based, Sentiment analysis (Gandomi & Haider, 2015) responses,corporate documents, news and call-center logs. -ﬁnancial news (Chung, 2014) Audio analytics -Voice (audio data) -Human spoken Language -Speech analytics, Customer call-center, (Hirschberg, Hjalmarsson, & Healthcare, Interactive Voice Response Elhadad, 2010) Smart agriculture -Sensors, Text Data, Images, -Documents, sensors device, GPS -Watershed management analysis (Hu, Cai, & DuPont, 2015) Audio, Video -Website -Crop modeling, Irrigation Water Management, (Wolfert et al., 2017) Irrigation Scheduling Smart health -Health Related Databases -Wearable devices, sensor data, machine generated -Electronic Health Record Analysis, clinical (Raghupathi, & Raghupathi, 2014) decision support, disease surveillance, Cyber-physical System -Expert Databases -Sensors, controller, networked manufacturing system -CPS based Industry 4.0 Systems (Lee, Bagheri, & Kao, 2014) -CPS for TES Systems (Lee, Jin, & Liu, 2017) Smart cities -Databases -Smart phones, Computer, -Transportation, Healthcare, Power grid, Smart (Al Nuaimi, Al Neyadi, Mohamed, & education, Energy Al-Jaroodi, 2015) -Environmental sensors, -Cameras -GeographicalPositioningSystems Fig.13.Theevolutionofvisualizationmethodology. isshowninFig. 14. This ﬁgureindicatesthethree-phasemethodbywhichbigdataanalyticscancreatevalueforcustomersand ﬁrms:- • Phase1 Value Discover: (i) Big Data Sources, (ii) Big Data Processing, • Phase 2 Value Creation: (iii) Big Data Analytics, (iv) Big Data Analytics Capabilities, Fig. 14. Framework for big data analytics and business insights for ﬁnancialaccounting. • Phase 3 Value Realization:(v)BigDataAnalytics-ValueInsights. 7.1. Value discover Intheﬁrst phase, BDA can create new insights that improve business-driven decision-making. For example, BDA can show that how ﬁrms can improve customer satisfaction and the speciﬁc features of the service experience. The growing importance of big data as a company asset is driving the development of new ways to value data assets. In the past, customer databases were considered as an important asset for ﬁrms (Srivastava, Tasadduq, & Fahey, 1998). For example, these databases could be used to create stronger relationships with customers, achieve higher loyalty, and create more eﬃcient and eﬀective (cross)-selling techniques. Big data holds tremendous potential for ﬁnancial services ﬁrms to develop new and innovative solutions that result in a signiﬁcant business value. The BDA mainly focuses on three major values to discover by the implementation of big data technology, for example, minimize hardware costs, check the value of big data before committing signiﬁcant company resources, and reduce processing costs (Leavitt, 2013). It requires congruence between business objectives, the big data storage and analytics approach. (Serrato & Ramirez, 2017) discussed three challenges for managers and decision-makers in order to take advantage of BDA. The ﬁrstistothinkcriticallyaboutanalyticstechniquesandtheanalyzesbasedonsuchdata,secondistoidentifyopportunitiesforcreatingvalueusingBD,andthe third one is to estimate the value created while using BD to address an opportunity. Stage 1 Data Sources: The creation of value discovers, facilitate ideas or insight to better decision-making for the big data model. The process began with the data input from various sources of data. For example, external users are investors, creditors, regulation, customer, and competitor etc and internal users are the owner, manager, and employee etc. Initially, pre-processing of the data conducted to clean and transform the data into meaningful Big Data. It results in the creation of knowledge for big data discovery. Stage 2 Big Data Management: On assessing the value of BDA to organizations, the key beneﬁtsarerecognizedastimelyaccesstodecision-makinginformation,greatertransparency,scalability,andbetterchangemanagement.BigDataManagement(BDM)sys-temsareofgreatvaluethatcanmonitorandreporttheexactinformationauserwishestoanalyze.Clouderaisanexampleofan “Analytics infrastructure”, which provides a Hadoop-based platform for execution of data analytics jobs in an enterprise environment. 10Gen provides “Operational infrastructure”forenterprise-gradedatabasesandmanagementbasedonMongoDBtechnology. 7.2. Value creation In the second phase, the value-creation beneﬁt of BDA is the development of more eﬀective marketing campaigns by selecting the right customer. However, the classical deﬁnitions of marketing by (Armstrong, Adam, Denize, & Kotler, 2014) highlighted that marketing should focus on creating superior value for customers (through high quality, attractive brand propositions, and striving for an appropriate relationship), and that ﬁrms can capture value from customers in return of value creation. (Verhoef, Kooge, & Walk, 2016) deﬁned value-to-ﬁrm and customer to ﬁrm'smetrics. RQ5: What is the relation between value-creation and Big Data Analytics? The value-creation is a major sustainability factor for companies, in addition to proﬁt maximization, customer retention, business goals and revenue generation. The adoption of Internet of Things (IoT), big data, and cloud computing technologies by companies/ organizations has led to better value-creation at the customer and enterprise ends (Haile & Altmann, 2016). Such as enterprise applications are designed to collect direct customer feedback and information from internal business operations (Verhoef et al., 2016). Adoptingbigdata analyticsasaﬁrm-level innovation aims to achieve ﬁrm heterogeneity and hence aﬀords higher value and contribute directly to the overall value-creation performance of banking ﬁrms, ﬁnancial services, supply chain management, and IT companies etc. McKinsey & Co. added ‘Value’ as the fourth ‘V’ to deﬁne big data (Chen et al., 2014). Value refers to the worth of hidden insights inside big data. Value represents the transactional, strategic, and informational beneﬁtsofbigdata.Moreover,it represents the extent to which big data generates economically worthy insights and beneﬁts through extraction and transformation (Wamba et al., 2015). Big data has the potential to transform almost every aspect of business from research and development to salesandmarketing,andsupply-chainmanagementthatprovidenewopportunitiesforgrowth. Stage 3 Big Data Analytics: The process of data collection, processing and analysis for BDA is expected to play a key role in ﬁnancial& accounting sector. Big Data has become a large pool of unstructured and structured data that can be captured, communicated,aggregated,stored,andanalyzedwhichisnowtreatedasapartofeverysectorandfunctionoftheglobaleconomy. RQ6: Which are the speciﬁc aspects of the data management, data transformation, and utilization drive value for companies? In today's digital landscape, data is more readily available and easily gathered, than ever before. With the ability to track mostcustomerinteractions,transactionsacrossdevicesandacrosschannels,companiesarelookingat,andleveraging,theirdatainnewinnovativeways.Dealingwithalargeamount ofinformation fromdiﬀerent data sources is a concern of Big Data management. The issues like how to store, integrate and process data in an eﬀective and eﬃcient way have been pointed out by (Brereton et al., 2007).For storing and processing large datasets companies can use traditional parallel database systems, Apache Hadoop technologies, key-value data stores (Hbase, NoSQL databases) etc. Apache Spark, SparkR, Techyon, MLbase, mahout and Splunk are a relatively newtypeofanalyticaltools,whichisbecomingmoreandmorepopularmostlyamongwebcompaniestoday. TheprocessofBDAframeworkhelpstodeterminetherightbusinessmodel.(i) Raw data: The companies which generate a rich pool of raw data can sell it with little investment, (ii) Processed data:Processeddatacomesfrommultiplesourcesthatarestored, managed and analyzed for others to consume,(iii) Insights: Use of data science, predictive modeling, machine learning, and analytics help perform complex correlations on data and gain business insights, (iv) Presentation: The ability to present the data, insight andanalyticmodelstokeybusinesspartners,helpsthemtobuild

公开最后更新: 2023-02-08 04:03:45 AM