A survey towards an integration of big data analytics to big insights for value-creation.docx
Asurveytowardsanintegrationofbigdataanalyticstobiginsightsforvalue-creation
Mandeep Kaur Saggi⁎, Sushma Jain
Department of Computer Science, Thapar, University Patiala, India
A R T I C L E I NFO
Keywords:
Bigdata
DataanalyticsMachinelearning
BigdatavisualizationDecision-makingSmartagricultureSmartcityapplicationValue-creation
Value-discoverValue-realization
A B S T R A C T
BigDataAnalytics(BDA)isincreasinglybecomingatrendingpracticethatgeneratesan en-ormous amount of data and provides a new opportunity that is helpful in relevant decision-making. The developments in Big Data Analytics provide a new paradigm and solutions for bigdatasources,storage,andadvancedanalytics.TheBDAprovideanuancedviewofbigdatadevelopment, and insights on how it can truly create value for firm and customer. This article presents a comprehensive, well-informed examination, and realistic analysis of deploying big data analytics successfully in companies. It provides an overview of the architecture of BDA including six components, namely: (i) data generation, (ii) data acquisition, (iii) data storage, (iv)advanceddata analytics,(v) datavisualization,and(vi) decision-makingforvalue-creation. Inthis paper, seven V's characteristics of BDA namely Volume, Velocity, Variety, Valence, Veracity,Variability, and Value are explored. The various big data analytics tools, techniques and tech-nologies have been described. Furthermore, it presents a methodical analysis for the usage of BigData Analytics in various applications such as agriculture, healthcare, cyber security, and smartcity. This paper also highlights the previous research, challenges, current status, and future di-rections of big data analytics for various application platforms. This overview highlights threeissues, namely (i) concepts, characteristics and processing paradigms of Big Data Analytics; (ii)the state-of-the-art framework for decision-making in BDA for companies to insight value-crea-tion;and(iii)thecurrentchallengesofBigDataAnalyticsaswellaspossiblefuturedirections.
Introductionandmotivation
ThenotionofBigDataAnalytics(BDA)isdrivenbyunderpinningnewwavesofinnovation,analyticserviceswithintelligenceand stirring advances in technologyoverthe lastfew decades. Theemergenceapplicationsof BDA haveprompted the attentionofmany academic researchers, industry practitioners, and government organizations. It is a technology-driven ecosystem, where betterdecision-makingwillhelpmanyorganizationstoextractknowledgefromdatainaninterpretableandappropriateform.
Strawn (2012), described Big Data as “fourth paradigm of science”, whereas (Hagstrom, 2012) defined it as “new paradigm of knowledge assets”, or “the next frontier for innovation, competition, and productivity” (Manyika et al., 2011). Gantz and Reinsel, (2011) defined Big Data as “a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling the high-velocity capture, discovery, and analysis”. It was described an integrated approach to organize, process, analyze the sixcharacteristics(namelyvolume,variety,velocity,veracity,valence,andvalue).BDAis
usedtogenerateactionfordeliveringtheinsights,value,measuringperformance,andestablishingcompetitiveadvantages(Wamba,
⁎Correspondingauthor.
E-mail addresses: mandeepsaggi90@gmail.com (M.K. Saggi), sjain@thapar.edu (S. Jain).
https://doi.org/10.1016/j.ipm.2018.01.010
Received20December2016;Receivedinrevisedform26January2018;Accepted30January2018
Availableonline09February2018
0306-4573/©2018ElsevierLtd.Allrightsreserved.
Akter, Edwards, Chopin, & Gnanzou, 2015). The paper by (De Mauro, Greco, & Grimaldi, 2016) defined that “Big Data is the information asset characterized by such a high volume, velocity, and variety to require specific technology and analytical methods for its transformation into value” .
The BDA, as a scientific topic of investigation, provides some significant and insightful readings that are discovered by various
researchers. However, it is still needed to carryout the systematic review of innovative analytical methods, techniques, and tools formaking insightful decisions in various domains. Indeed, it became a key component of decision-making processes in business (Hagel, 2015).
Thebigdataandadvanceddataanalyticstechniquescanbeusedforthe developmentof analyticaland computationalmodels(Iqbal, Doctor, More, Mahmud, & Yousuf, 2017). There are still several research interest how to develop the infrastructure, apply various data mining and machine learning algorithms in different domains. The BDA is concerned with modern statistical and machine learning techniques to analyze huge amount of data (Suthaharan, 2014). The researchers suggested that Big Data Analytics and deep learning have the potential to provide new generation applications based on modeling and simulation (Chen & Lin, 2014; Tolk, 2015).
Thetraditionaltoolsarenotabletoaddresstheissuesofscalability,adaptability,andusability,whereassuchissuesarecriticaltoitssuccessastheyinfluence how big data is developed, managed and analyzed. The BDA is categorized by the requirement of advanced data acquisition, data storage, data management, data analysis, and visualization. To turn BDA into big insights for value- creation, there are great challenges in terms of data, process, analytical modeling and management for different applications. It should not be considered as synonymous with data collected through the internet as data can be originated from sources such as commercial transactions taking place in supermarkets, bank etc. Big Data can also be originated from sensors (satellite and GPS tracking data from mobile phones) and administrative data (education records, medical records, and tax records) (Eagle, Pentland, & Lazer, 2009).
The BDA helps in acquiring a deep understanding and useful insights of various sectors such as: agriculture, healthcare, cyber-physicalsystem,smartcitiesandsocialmediaanalyticsetc.Theenormousamountofinformationisneededtoanalyzeitinaniterative way and time sensitive manner (Jukic, Sharma, Nestorov, & Jukic, 2015). By the use of advanced BDA tools such as NoSQL, BigQuery, Map Reduce, Hadoop, Flume, Mahout, Spark, WibiData, and Skytree, it provided an insight in desirable form to enhance the ability and decision-making process in various sectors such as business intelligence analytics (Chen, Chiang, & Storey, 2012), healthcare analytics (Archenaa & Anita, 2015), smart agriculture or farming analytics (Majumdar, Naraseeyappa, & Ankalaki, 2017; Wolfert, Ge, Verdouw, & Bogaardt, 2017), social media analytics (Vatrapu, Mukkamala, Hussain, & Flesch, 2016), smart cities (Khan, Anjum, Soomro, & Tahir, 2015), intelligent transport management (Fiosina, Fiosins, & Müller, 2013), financial and accounting (Sledgianowski, Gomaa, & Tan, 2017), financial risk management (Cerchiello & Giudici, 2016), energy management (Tu, He, Shuai, & Jiang, 2017), and future predictions (Waller & Fawcett, 2013).
TheBDAisdata-drivendecisionframework.ThisarticleisdirectedtocomprehensivelystudytheBDAtosolvethechallenges,gain insight, and to make informed decisions by using various data analytics approaches. This paper summarizes an extensive andsystematicmethodologicalreviewonvarioustoolsandtechnologiesofBDAandalsoreportstheresearchgapsforfurtherin-vestigation.Inmoredetail,ourreviewarticleaddressedfollowingresearchquestions:
• RQ1: What are the most important seven characteristics of Big Data Analytics?
• RQ2: How to design BDA-DM framework?
• RQ3: What are the main tools, techniques, and technologies of Big Data Analytics?
• RQ4: What are the main application areas of Big Data Analytics?
• RQ5: What is the relation between value-creation and Big Data Analytics?
• RQ6: Which are the specific aspects of the data management, data transformation and utilization drive value for companies?
• RQ7: Can the value of data be monetized, tracked and considered for financial accounting?
• RQ8: What are the different challenges of each component of the BDA framework?
This article attempts to answer the above research questions (RQs). RQs will guide, centre our research work and clearly focus onspecifictopicstoindicateourdistinctiveperception.However,thisworkleadstoanewadvancementfortheconceptualframeworkofBDA.
The contributions of this research article are as follows:
Categorize the current approaches and generalrequirements for various components of BDA architecture by demonstratingtheopenstate-of-the-artframeworksandchallenges.
• Summarize various existing tools, methods, and technologies in advanced BDA.
• Provide the summary of the key technology for value-creation applications, financial companies of BDA.
• Present the, future research directions relating BDA in new emerging technologies.
This paper is structured into eight sections. The Section 2 describes the relevant research methodology and summarizes the review studies. The Section 3 presents an ecosystem of Big Data Analytics and Decision-Making Framework (BDA-DMF). The Section 4 presents a big data management phase of the framework. The Section 5 presents Big Data Analytics techniques, technologies, tools, and its applications phase. In this section, we present a concise statement of different steps of data analytics framework. A brief review of different areas of application such as Agriculture, Healthcare, Cyber security and Smart City is also presented. The Section 6
covers a visualization phase of BDA framework. The Section 7 describes the value-creation need, benefits, and framework of BDA for financial and accounting companies. Section 8describestheconclusionandfutureresearchdirectionsintheareaofBigDataAnalytics.
Aglimpseofbigdataresearchmethod:systematicmappingprocess
In this paper, the articles from Web of Science digital database are considered. To ensure thoroughness and consistency in ourreview, the guidelines presented by (Brereton, Kitchenham, Budgen, Turner, & Khalil, 2007) are followed and used the digital library databases (Springer, Science Direct, Google scholar, IEEE Xplore, ACM library).
The Web of Science database index containsseveraltypes of documentsnamelyarticles,reviews, proceedingpapers, meetingabstract, editorial material, book review, and book chapters. Significant research publications have obtained from Web of Science on BDA, Big Data Analytics-Management (BDA-M) and Big Data Analytics-Machine Learning (BDA-ML) for a considerably large period of 20 years from (2000–2017).
2.1. Data inclusion and exclusion process
In this paper, 70 primary studies have been selected and analyzed through a process that formulates criteria for inclusion andexclusionarticlesforreview.TheselectionprocessofprimarystudiesissummarizedinFig. 1.
Firstly, the selection process of primary studies is conducted based on different queries. The queries are executed topic-wise, title- wise and combination of both. The last filterisbasedontheabstractandfullreadingofthepaper.Ifitisnotrelevanttothestudy,itisautomaticallyexcluded.Onreadingtheabstractofresults,theinclusion/exclusioncriterionwasappliedattheend.
• Secondly, the papers reporting on theoretical, empirical and both qualitative or quantitative case studies have been considered.
Table1
Examplesofqueryextractingprocess.
QueryNumber
Topic/Title
Keywords
Q1
A1
TS=(“Big Data” AND “Big Data Analytics”)
Q2
A1
TS=(“Big Data Analytics” AND “Big Data” AND “Management “)
Q3
A1
TS=(“Big Data Analytics” AND “Big Data” AND “Machine Learning”)
Q1
A2
TI=(“Big Data” AND “Big Data Analytics”)
Q2
A2
TI = (“Management” AND “Big Data”) OR (“Management” AND “Big Data Analytics”)
Q3
A2
TI = (“Machine Learning” AND “Big Data”) OR (“Machine Learning” AND “Big Data Analytics”)
Q1
A3
TS = (“Big Data Analytics” AND “Big Data”) AND
TI = (“Big Data Analytics” AND “Big Data”)
Q2
A3
(TS = (“Management” AND “Big Data ”) OR (“Management” AND “Big Data Analytics”)) AND
(TI = (“Management ” AND “Big Data”) OR (“Management” AND “Big Data Analytics”))
Q3
A3
(TS = (“Machine Learning ” AND “Big Data Analytics”) OR (“Machine Learning ” AND “Big Data ”)) AND
(TI = (“Machine Learning ” AND “Big Data Analytics”) OR (“Machine Learning” AND “Big Data”))
Table2
Top18researchareasoftheexistingBigDataAnalyticscontributions.
Researchareas
Topic
Title
Topicandtitle
Q1
Q2
Q3
Q1
Q2
Q3
Q1 Q2
Q3
Computer Science
483
127
63
125
50
13
124 49
13
Engineering
196
63
20
56
25
6
55 25
5
Business Economics
128
55
2
29
18
1
29 18
2
Telecommunications
78
18
7
32
11
3
32 11
2
Information Science Library Science
69
18
2
15
4
1
14 3
1
Operations Research Management Science
61
23
1
11
6
1
11 6
1
Science Technology Other Topics
35
9
5
9
5
1
9 5
1
Health Care Sciences Services
32
8
4
9
1
1
9 1
1
Automation Control Systems
10
6
2
2
3
1
2 3
1
Medical Informatics
24
6
5
2
1
1
2 1
1
Mathematics
18
1
1
3
1
2
3 1
1
Environmental Sciences Ecology
15
6
6
1
6
2
1 6
2
Neurosciences Neurology
13
1
1
3
1
1
3 1
1
Remote Sensing
11
1
1
3
2
1
3 2
1
Mathematical Computational Biology
10
2
2
1
1
1
1 1
1
Communication
8
1
1
2
1
1
2 1
1
Biotechnology Applied Microbiology
8
1
1
2
1
1
2 1
1
Agriculture
6
2
2
2
1
1
2 1
1
2.2. Study selection process and data analysis
Thereviewarticleselectionprocessisbasedon“Query Extracting Process”. The queries are numbered as Q1, Q2, and Q3 by using a combination of various keywords. Table 1showssomeexamplesoftheexecutedqueries.
Inthe first scenario, 1027 research articles are selected on the basis of their topic-wise execution of “Q1”, which includes 867 articles, 84 editorial material, 66 reviews, 26 proceeding papers, 3 book reviews, 7 book chapters, and 1 meeting abstract. On execution “Q2”, a total of 272 research articles are listed that contains 222 as articles, 30 reviews, 19 editorial material, 1 letter, 2 book chapters, and 7 proceeding papers. Furthermore, 105 research articles obtained on execution of “Q3”,amongwhich,thereare87articles,11reviews,4editorialmaterial,3meeting,and3proceedingpapers.
In the second scenario, the aforementioned queries are searched on a title-wise. As a result, 239 research articles are selected onexecutionof“Q1”, which contains 159 articles, 54 editorial material, 13 reviews, 2 proceeding papers, 3 book reviews, 1 book chapter, and 10 meeting abstract. Further, 120 research articles are listed on the execution of “Q2”thatconsistsof66articles,6reviews,43editorialmaterial,1letter,4meetingabstract,and1proceedingpapers.The35researcharticlesarelistedonexecuting
“Q3”, which includes 18 articles, 6 reviews, 6 editorial, 3 meeting, 1 correction and 1 proceeding papers. Finally, the topic-wise andtitle-wise queries are combinedand executedsimultaneouslywhich results into 90, 40 and 20 papers on the execution of Q1, Q2, andQ3respectively.Afterthemanualcleaningofthedata,70primarystudypaperswereobtained.
Table3
Top16JournalspublishingarticlesofBigDataAnalytics.
Journal
Topic
Title
Topicandtitle
Q1 Q2
Q3
Q1
Q2
Q3
Q1 Q2
Q3
Big data
32 6
13
3
1
1
8 1
1
IEEE Access
18 2
3
8
1
1
3 2
1
IBM Journal of Research And Development
17 5
1
3
1
1
3 2
1
PLOS one
11 1
1
3
1
1
2 1
1
Decision Support Systems
11 6
1
2
1
1
4 2
1
Big Data Research
11 4
1
4
2
2
4 2
1
IEEE Network
9 2
1
4
1
1
5 1
1
Computer
9 1
1
5
1
1
1 1
1
Information System
4 1
1
1
1
1
1 1
1
Future Generation Computer Systems The International Journal of Grid Computing And Science
5 2
1
1
1
1
3 1
1
Communications of The Acm
5 1
1
3
1
1
2 1
1
Cluster Computing The Journal of Networks Software Tools And Applications
5 1
1
1
1
1
1 1
1
Expert Systems With Applications
4 1
1
1
1
1
1 1
1
Knowledge And Information Systems
3 1
2
1
1
1
1 1
1
IEEE Communication & Survey Tutorial
1 1
1
1
1
1
1 1
1
Fig.2.DistributionofarticlesbyyearofpublicationsinWoS.
2.3. Result related work
The summary is presented in Table 2, Table 3 and Fig. 2 and Fig. 3. The Table 2 indicates the number of publications by top 18 research area, the Table 3 indicates the list of the number of publications by top 16 journals, the Fig. 25 indicates the number of publications per year to justify trends of BDA (2000- 2016), and the Fig. 3 indicates the percentage of publications in the area of BDA.Further, it represents the topic-wise, title-wise, and both in topic-wise and title-wise percentage of publications in the area of BDA,BDA-M,BDA-ML.
The Fig. 2 shows the no of publications of BDA (537 and 110), BDA-Management (146 and 73) and BDA-Machine Learning (63and20)onthebasisoftopic-wiseandtitle-wisequeries.Thetopic-wise(TS)queriesQ1,Q2,Q3arerepresentedwithRed,Blue,andGreencolouredlinesrespectively,whilethetitle-wise(TI)queriesQ1,Q2,Q3arerepresentedasMagenta,YellowandCyancolouredlinesrespectively.
2.4. Past, present, and future of big data analytics
In 1997, the terminology of “big data” was firstly described by (Cox & Ellsworth, 1997). The visualization provides an interesting challenge for computer systems as the: - data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. The popular press McKinsey and Company leveraged resources to document a 5–6% increase in global productivity fromdata-drivenanalytics,overthenon-bigdata-friendlycompany.InternationalDataCorporation(IDC)foundthatthecreatedand
copieddatavolumeintheworldwas1.8zettabytes(ZB).IBMindicatesthateveryday2.5EBofdataiscreated.CISCOpredictedthat,by 2020, 50 billion devices will be connected to networks and to the Internet. Fig. 4 shows the timeline of the Big Data processing paradigms and technologies (Nino & Illarramendi, 2015; Buyya, Calheiros, & Dastjerdi, 2016)
The Big data resources present a great opportunity for digital business models, and can be seen with Google, eBay, Amazon,
Fig.3.Publicationsintheareaofbigdataanalytics,management,andmachinelearning.
Fig.4.Thetimelineofthebigdataprocessingandtechnologies.
Facebook and Netflix, Borders and many other businesses (Mithas & Lucas, 2010). In 1999, Apache software foundation (ASF) was established. For Big Data, batch processing was introduced in 2003 and 2004, Google popularized its papers on Google File System and Map Reduce. The first generation of Big Data was initiated in 2006 when Hadoop was born. Similarly, Apache Pig is originally developed and endorsed by Yahoo, Facebook began the development of an open-source tool for identical desire, Apache Hive. Yahoo! introduced Pig in 2008 and Facebook started Hive in 2009 (Casado & Younas, 2015).
The Second generation presented the Apache storm for real-time processing. It was started by Nathan Marz and released as opensourcebyTwitter.OthercompanieslikeClouderaorLinkedinpresentedinterestingtechnologiessuchasFlumeandKafka.Theseopensourcedevelopmentshavedefined an ecosystem of Big data tools around Apache Hadoop, together with other components such as Apache Spark, Mahout (machine learning), Sqoop (data transferring between Hadoop and other systems), Oozie (job scheduling and monitoring on Hadoop), Zookeeper (distributed process configurationandcoordination),andApacheGiraph(toprocessdatastoresasgraphs)in2014.
Since1997,manycharacteristicsaddedtoBigData.Thefirst 3-V volume, variety, and velocity characteristics have been fa- miliarized by (Gartner, 2011), the fourth V, Veracity has been included by Dwaine Snow in his blog named “Dwaine Snow's Thoughts on Databases and Data Management” in 2012. The first 3Vs: volume, velocity, variety (Chen & Zhang, 2014),4Vs:volume,velocity,
variety, and veracity (Abbasi, Sarker, & Chiang, 2016; Zikopoulos & Eaton, 2011) are described. Both variety and velocity are essentially working beside the veracity of the data. These V's decrease the capacity to cleanse the data before analyzing it and making useful insights. The 5V's are volume, velocity, variety, veracity, and value (Oracle, 2012), the fifth V introduced by (Gamble & Goble, 2011) refers to worthwhile and valuable data for business. The 7V's: volume, velocity, variety, veracity, value, variability, and visualization (Seddon & Currie, 2017). Variability and complexity are two other facts specificallyforanalyticalareas.
RQ1: What are the most important seven characteristics of Big Data Analytics?
Someofthetechnicalchallengeshavebeenassociatedtodifferent “V” characteristics, in particular “Volume” (support of very high data volumes), “Velocity” (fast analysis of data streams), “Variety” (support for diverse kinds of data), “Veracity” (support for high data quality), “Value” (the value of the insights and benefits), “Variability” (support for constantly changing), and “Valence”(supportofconnectivityindata).
The seven characteristics of BDA include some exploration of different steps and processes of data analytics. These seven aspects represent different difficulties in analyzing big data. Our major aim is to provide a comprehensive picture of each characteristic and also describes their challenges. These seven characteristics of BDA are shown in Table 4 and further explained as follows (Sivarajah, Kamal, Irani, & Weerakkody, 2017):
Currently, Big Data Analytics has become a trendy practice in business intelligence that consists massive amount of dataset andadvancedanalytictechniques.Villars, Olofson, and Eastwood (2011) stated that business and organizations can “extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and analysis”. Kambatla, Kollias, Kumar,
and Grama (2014) presented a literature survey on Big Data Analytics. Assuncao, Calheiros, Bianchi, Netto, and Buyya (2015) stated that cloud computing plays a key role for Big Data because it can act as a business model to follow popular terms e.g. Analytics as a Service (AaaS) or Big Data as a Service (BDaaS). Zhang and Xiang (2015) discussed that BD integration, data quality issues, privacy and analytics can be used for effectivebusinessdecision.
The paper by (Chen, Kazman, & Haziyev, 2016) introduced an architecture-centric approach, called Architecture-centric Agile Big data Analytics (AABA). Its purpose is to address technical and organizational challenges in big data system development and agile delivery of big data analytics for web-based systems. Fogelman-Soulié and Lu (2016) presented an application of Big Data Analytics inbusiness(e.g.credit-cardfrauddetection).TheframeworkdevelopedinthisstudyshowedthathowcompaniescanstoretheirBigDatainadatalakeiftheywanttoimplementmanyBigDataprojects.
Thepaper by (Ahmed et al., 2017) explored the recent advances and key requirements for managing BDA on the Internet of Things (IoT) environment. Bashir and Gill (2016) proposedanIoTbigdataanalyticsframeworktoovercomethechallengesofstoring
Table4
TheSevenV'scharacteristicsofBigDataAnalytics.
Name Description
Examples
Challenges
Volume (Barnaghi, Sheth, & Henson, 2013)
Volume of big data is explained in terms of its size and exponentialgrowth. Large-scale and the sheer volume of data is a big challenge.Itisknownassize.
Applications:
-Medicaldata,Socialmedia
Scaleofdata:
-Terabyte
-Petabytes
-Exabyte
-Yotabyte
-DataStorage
-Dataacquisition
-Processingofdata
-Performance
-Cost
Variety (Chen et al., 2013) It refers to the complexity of large data set which may be semi-
structured,unstructuredorstructured.Itisknownascomplexity.Applications:
Weather data, DNA Sequencing, Biology
Different forms of data:
-Text,documents
-Images,voice, audio,video
-Geo-spatial data
-Networkdata
-Sensorsdata
-Heterogeneity of data
-Diverse
-Dissimilarforms
Velocity (Sivarajah et al., 2017)
Veracity (Vasarhelyi et al., 2015)
Valence(Sivarajahetal.,2017)
Value (Sivarajah et al., 2017)
Variability
(Sivarajah et al., 2017)
It is a high rate of data inflow with non-homogenous structure. It is known as speed.
Applications:
Financialmarket,adagencies
Veracityfeaturemeasurestheaccuracyofdataanditspotentialuseforanalysis.Itisknownasquality.
Itreferstotheconnectivityofbigdataintheformofgraphs.ItisknownasConnectedness.
Applications:
Healthcaredata
BigData=Data+Value?
Itistheheartofthedatachallenge.Itextractsknowledgeablevaluefromvastamountsofstructuredandunstructureddatawithoutloss,forendusers.
Applications: Business or industries
Itreferstodatawhosemeaningischangedconstantlyandrapidly.Itremainsaconstantchallenge.
Application:
Stockmarket,financedata
Analysis of streaming data:
-Batchprocessing
-Real-time processing
-StreamingprocessingUncertaintyofdata:
-increasinglycomplexdatastructure,
-inconsistency in largedata-sets
MeasureofConnectivity:
-DataConnectivity
SevenV's:
-Size
-Complexity
-Quality
-Connectedness
-Speed
-Variations
-Value(important)Variationindataflowrates
-Complexity
-Slow and expensive nature of data processing
-Accuracyofdata
-Reliabilityof the datasources
-Context within Analysis
-inaccuracy,latency,subjectivity
-Morecomplexdataexplorationalgorithm.
-Modelingandpredictionofvalencechanges.
-Group event detection.
-Emergentbehavioranalysis
-Increaserevenue
-Decreaseoperationalcosts
-ServeCustomers
-Inconsistencyofdata
-Peak-levelcomputingDemand
-Periodic peaks and Troughs
andanalyzingalargeamountofdataoriginatingfromsmartbuildings.Rathore, Ahmad, and Paul (2016) proposed a smart city management system based on IoT that exploits big data and analytics. Sezer, Dogdu, Ozbayoglu, and Onal (2016)proposedanaugmentedframeworkthatintegratessemanticwebtechnologies,bigdata,andIoT.
For the processingand analysisof Big Data,various recently usedplatformsare investigated for large amountof IoT generateddata as follows:(i) enablingcapabilityfor storing & processinglargeamount of data(Apache Hadoop, 2011), (ii) enabling capability for advanced data analytics: extraction, transfer and loading (ETL) (1010data), (iii) enabling capability of big data IoT processing and analytics (SAP-Hana, 2013), (iv) enabling capability that support for Hadoop in order to big data processing and analysis (Cloudera, 2008), (v) enabling capability for parallel processing, analysis and security for unstructured data (HP-HAVEn, 2013), (vi) enabling capability for Hadoop based processing and analysis on large amount of data (Hortonworks, 2011), (vii) enabling capability for analytical database that combine massively parallel processing (MPP) petabyte scale volume data (Pivotal big data suite, 2016), (viii) enabling capability for data analyze and management problem solving up to 50 terabyte (Infobright, 2005), (ix) enabling capability for fast processing, analyzing, and predictive capabilities (MapReduce, 2008).
Further, the structures of the top primary studies are classified. The structure for classification is based on the method which was proposed by (Jabbour, 2013). The classification scheme includes six categories: - namely study, objective, focus, capabilities, benefits, and their results as shown in Table 5.
• Study: It consists conceptual, theoretical, empirical, literature review, and case study.
• Objective: Various objectives of BDA, related review, and research.
• Focus: Various researches focus on direction of BDA in different domains of application.
• Capabilities: It includes important data capabilities such as analytics, prediction, decision, and management.
• Benefits: Various benefits and impact of BDA.
Table5
Top20Primaryresearchescomposedthesample.
Primary
Study
Objective
Focus
Capabilities
Benefits
Result
1. (Pääkkönen & Pakkala,
-Conceptual
Survey and use case on BDA
Referenced architecture on commercial
-Analytical
Commercial product & services for BDA
A new perspective on
2015)
-Theoretical
applications
product and services for BDA system
system
research
2. (Oussous, Benjelloun,
-Conceptual
Survey of BDA technologies and
Various tools and technologies
-Analytical
BDA opportunities, application, challenges
A new perspective on
Lahcen, & Belfkih,
-Theoretical
algorithms
-Decision
and issues
research
2017)
3. (Liu, Li, Li, & Wu,
-Theoretical
Survey of empirical studies on
Big data errors in spatial information
-Decision
Reduced big data error in data collection
A new perspective on
2016)
-Empirical
data quality & data usage of BDA
science
processing and analysis
research
-Case Study
4. (Zhang & Xiang, 2015)
-Conceptual
Survey of BDA data quality
Data quality solutions for business
-Analytical
Increase the data privacy, security, quality
Consistent with previously
-Theoretical
organization
issues
published literature
5. (Mikalef, Pappas,
-Conceptual
Literature survey on BDA
Resource base theory (RBT) and
-Analytical
Theoretical framework on business value and
Research in domain of
Krogstie, &
-Theoretical
capabilities of BDA
-Decision
competitive advantage for BDA application
knowledge to gain insights
Giannakos, 2017)
-Literature
-Management
through analysis
Review
6. (Yaqoob et al., 2016)
-Theoretical
Survey on BDA processing,
Usage in many multidisciplinary
-Analytical
Increase in productivity of industries/
Research in domain of
-Case study
technologies & organization case
application
-Predictive
companies and provide consumer density of
knowledge to gain insights
study
-Decision
the firm with BDA
through analysis
7. (Zhou et al., 2016)
-Conceptual
Systematic review of BDA for
Industrial development of big data
-Decision
Energy Efficient big data- driven optimization
Research in domain of
-Theoretical
smart energy management.
driven smart energy management
-Management
& real-time monitoring and forecasting
knowledge to gain insights
through analytics
8. (Tu et al., 2017)
-Conceptual
Survey on smart grid integration
Empirical studies on smart grid and
-Analytical
Stability & Reliability Utilization & Efficiency
A new perspective on
-Theoretical
of BD management and BDA
energy big data analytics
-Decision
Better Customer Satisfaction
research
algorithm
-Management
9. (Kshetri, 2016)
-Modeling
Survey on role of big data in
Analytics in Financial companies
-Analytical
The use of BDA helps to overcome the
A new perspective on
-Literature
facilitating the access to financial
-Decision
reducing information opacity and transaction
research
review
services in china
-Management
costs
10. (Cerchiello & Giudici,
-Modeling
Systemic risk model based on big
Financial Risk management/ markets
-Predictive
Understanding of Financial services
Replication to a different
2016)
-Literature
data
and tweets
-Management
context or period
review
11. (Yang, Zhong, Liu, &
-Conceptual
Theoretical framework of
Improve Big data storage mode.
-Analytical
Eliminate data noise and to remove data
Consistent with previously
Feng, 2014).
-Theoretical
financial data classification
redundancy
published literature
standard
12. (Sun, Chen, & Yu,
-Modeling
Generalized optimal wavelet
Big Financial and financial analytics
-Analytical
FA provide better understand the viability,
Research in domain of
2015)
-Empirical
decomposing algorithm for big
-Predictive
stability, and profitability of business/
knowledge to gain insights
financial data
-Decision
market/beneficial decisions
through analytics
-Management
(continued on next page)
Table 5 (continued)
Primary
Study
Objective
Focus
Capabilities
Benefits
Result
13. (Crawley & Wahlen,
-Literature
Survey on analytics in empirical/
new analytics for testing hypotheses
-Analytical
Informed business professionals about
Consistent with previously
2014)
Review
achieving financial accounting
questionnaires analytics in accounting
financial accounting
published literature
14. (Tian, Han, Wang, Lu,
-Empirical
System architecture for Big data
Analysis on the critical latency analytics
-Management
Useful for the banking and financial
Comparative research
& Zhan, 2015)
analytics
requirement in finance using BDA
organizations
15. (Edwards & Taborda,
-Theoretical
Review on domain of analytics,
To understand relationship between the
-Analytics
Understand the knowledge domain using data
Replication to a different
2016)
risk management, and knowledge
knowledge data, techniques, and
-Decision
analytic capabilities
context or period
management
experience
-Management
16. (Wu, Li, Cheng, & Lin,
-Theoretical
Healthcare-wearable technology
Bring new opportunities for healthcare-
-Analytics
higher-quality firms, the optimal quality level
A new perspective on
2016b)
-Empirical
optimize insights
wearable device providers
for health and biomedical sector
research
17. (Cetin, Demirçiftçi, &
-Theoretical
Review of revenue management
Hotel revenue managers with KSAs
-Decision
Helps to understand the revenue management
Research in domain of
Bilgihan, 2016)
challenges
required in managing inventory and
-Management
challenges
knowledge to gain insights
prices
through analytics
18. (Addo-Tenkorang and
-Theoretical
Review on ‘‘big data,” its
Big data applications attempting to
-Analytics
The four main attributes or factors identified
Comparative research
Helo, 2016)
-Literature
Review
applicationandanalysisof
operations or supply-chain
identifyandunderstandthechallenges
in industrial or supply chain
-Decision
-Management
with ‘‘big data” Variety, Velocity, Volume,
Veracity, and Value-adding
management
19. (Hazen, Skipper,
-Theoretical
Review of big data and predictive
Focus on eight theory-driven impact of
-Analytics
Identifying development of BDA Prediction
A new perspective on
Ezell, & Boone, 2016)
analytics
BDPA's on supply chain management
-Management
and modern competitive upon firm
research
performance
20. (urRehman et al.,
-Theoretical
Review of the big data analytics
Proposed knowledge-driven based big
-Analytics
It enables local knowledge availability,
Research in domain of
2016)
-Case Study
process and popular relevant tools
data reduction framework for value
-Management
privacy preservation, and secure data sharing
knowledge to gain insights
for value-creation
creation
functions to build trust between customers
through analytics
andenterprises.
Fig.5.Review structure of thepaper.
• Result:Itshowstheusefulvalueinsightsasinresultfromtheseprimarystudyarticles.
The studies primarily focused on different applications areas of BDA. Various researchers proposed framework for Big Data and Product Lifecycle Management (BDA-PLM) (Zhang, Ren, Liu, Sakao, & Huisingh, 2017), the big scholarly data lifecycle (Assuncao et al., 2015; Khan, Liu, Shakil, & Alam, 2017), 3-As Data Quality-in-Use model for data quality characteristics for use in Big Data projects (Merino, Caballero, Rivas, Serrano, & Piattini, 2016), the Unified Technology Acceptance and Usage Theory (UTAUT) aligned with the idea of Big Data as a Service (Shin, 2016), novel conceptual basis Operational business intelligence (OpBI) systems designed with value-based business requirements (Hänel & Felden, 2015), unified and dynamic framework analysis of big data business values and managerial, operational, organizational changes led by data-driven approach (Sheng, Amankwah-Amoah, &
Wang, 2017), conceptual model of the seven V′s of big data analytics to gain a deeper understanding of the strategies and practices of
high-frequency trading (HFT) in financial markets (Seddon & Currie, 2017).
Bigdataanalytics&decision-makingframework(BDA-DMF)
The framework of Big Data Analytics and Decision-Making Framework (BDA-DMF) is shown in Fig. 5 to discover value in the business ecosystem. This figure indicates the big data management, big data analytics, data visualization, and decision-making for value-creation that are discussed in Sections 4, 5, 6, and 7respectively.
RQ2: How to design BDA-DM framework?
Big data analytics is a data-intensive architecture that provides various technologies and platforms used in various phases such asdatageneration,dataacquisition,datastorage,advanceddataanalytics,visualizationanddecision-makingforvalue-creationasshowninFig. 7.Itfollowsatop-downapproach.Itconsistsvarioustechniquesandtechnologiesi.e.Hadoop,HBase,Cassandra,
Fig. 6. Differenttypesofdatadomain.
Fig.7.Architectureofbigdataanalytics.
MongoDB,NoSQLandsoon.Aslimitation,thesetoolsandtechniquescannotsolvetherealwordproblemsofdatastoring,datasearching,datasharing,datavisualization,andalsoreal-timeanalysis.
Bigdatamanagement
Bigdatamanagement(BDM)providesaninfrastructuretoBigDataAnalytics,where datamanagementtechniques,tools,andplatformsincludingstorage,pre-processing,processingandsecuritycanbeapplied(Bilal et al., 2016; Siddiqa et al., 2016).ThecomponentsinvolvedinBDMaredescribedas:-
4.1. Data sources
Big data generation refers to generation of data from various relevant sources. It can be generated by humans, machines, businessprocesses,anddatatechniquesthataredescriptive,predictive,andprescriptive.
4.1.1. Data domain
A flourishing domain of data is expressed by variety of descriptive terms such as:-structured, unstructured, machine and sensor- generated data, batch, and real-time processing data, biometric data, human-generated data, and business-generated data. The Fig. 6showstherelevanceforvariousgenerationsofbigdataanalyticsdomains
Machine-Generated Data:Themachine-generateddatacomesfromseveralcomputernetworks,sensors,satellite,audio,videostreaming,mobilephoneapplications,andpredictionofsecuritybreaches.
Human-Generated Data: It can be collected by people, for example: identification details having their name, address, age, oc- cupation, salary, qualification etc. Whereas, real streaming data can be generated by various files, documents, log files,research,emails,andsocialmediawebsitessuchasFacebook,Twitter,YouTube,LinkedIn.
Business-Generated Data: The volume of business data of all companies across worldwide is estimated to double every 1.2 years such as transactional data, corporate data, and government agencies data. When Business intelligence (BI) of BDA is discussed, it means: value (does the data contain any valuable information for my business needs?), visibility (focus of insight and foresight of a problem and an adequate solution associated with it) and verdict (potential for decision-makers based on problem, computa- tional capacity and resources) within the business intelligent domain (Wu, Buyya, & Ramamohanarao, 2016a).
4.1.2. Data types
Following are the three types of analytics that organizations and industries can use to learn and get the insights to promote theirbusiness.
Descriptive: It is composed of various technologies and summaries of inferred data that represent current and previous happening process. Standard reporting, ad-hoc reporting, dashboards, querying, and drilling down are the various examples of descriptive analytics. It is defined as look into past in order to draw some inferences.-“What has happened?”
Predictive: The predictive analytic modelings are root-cause analysis, Monte Carlo simulations, and data mining. It is sometimes
used in real-time or in batch-time processes. Siegal (2010) illustrated that seven sequential objectives are organized by adopting these predictive analytics namely compete, grow, enforce, improve, satisfy, learn, and act. It predicts future trends.-“What could happen”?
Prescriptive: This technique is applicable to future scenario and advises a solution or insightful actions from the predictions. Basu (2013) represented the five pillars of prescriptive analytics namely: hybrid data, integrated predictions and prescriptions, pre- scriptions and side effects, adaptive algorithm, and feedback mechanism.-“What should we do?”
4.2. Data acquisition
Here,dataacquisitioncoversabroadspectrumofcollecting,filtering and cleaning process of data ingesting in a data warehouse or any other databases. (Chen, Mao, & Liu, 2014)investigatedthatdataacquisitionsupportsheterogeneityduetoavarietyofdevices.
4.2.1. Data collection
It is a processto acquirethe unprocesseddatafrom real-world environment,and develop it proficiently. Log files are widely used to expand data collection that is generated by multiple sources and all applications working on electronics devices such as extended log format (W3C), common log fileformat(NCSA)andIISlogformat(Microsoft).
Sensors are another substitute that measures a physical quantity and transfers it in readable form by digital signals. There existseveral types of sensors such as audible, sound, automotive, vibrate, electric current, weather, thermal, pressure transferred throughwired or wireless networks. Web crawlers are generally used to collect data or applications from various website based processes suchas(websearchenginesorwebcaches)(Castillo, 2005).
4.2.2. Data staging
Further,itisdefined as a process for the collecting of wide variety of data sets along with noisy, redundant, and consistent data. It is divided into two alternative models namely: - the streaming processing models and batch-processing models. The streaming processing model analyzes the data as soon as possible to derive its results where the data arrives in continuous form at very fast speed. To support it, there are some open source systems that include Storm, S4, and Kafka (Hu, Wen, Chua, & Li, 2014).
In the batch-processing model, data is first stored and then analyzed. In this model, MapReduce (Dean & Ghemawat, 2008) has become the dominant platform. Fig. 8 shows(a)thedatastagingintotwopartsofdataexplorationanddatapre-processingformsand
(b)thepredictivemodel.
Data Exploration: There are two main aims of data exploration. Firstly, to determine and understand nature as well as char- acteristics of data. Secondly, to determine the data quality issues that can badly affect the model. Data exploration and data mining are widely used to discover new insights. For example: - data quality report (mean, mode, median, and range; standard deviation and percentiles; bar plots, histograms and boxplot)anddataqualityissues(validorinvalid).
Pre-Processing: To extract the meaningful information from the big data, it is necessary to clean, integrate and transform the data (Hu et al., 2014)throughvarioustoolsnamelyApacheHadoop,NoSQL,andMapReduce.Pre-processingisrelatedtoseriesofstepsnamelyhowtointegratedata,howtotransformdata,howtoselecttherightmodelforanalysisandhowtoprovidetheresults.
Fig.8.(a)Datastagingand(b)Thepredictivemodel.
Fig.9.ThePlatformofvariousbigdatastorage.
-Cleaning: It is an essential goal of pre-processing which clean, address the data quality and format because of its messy nature. Itenablesustodiscoverimprecise,insufficient,orimmoderatedatathatrequiresaltering,removingandimprovingdataquality.
-Integration: With the use of extract, transform and load (ETL) process, the data can be cleaned, well transformed and made itapplicabletodataminingandvariousonlineanalytics.
-Transformation:Thetransformationof therawdataistomakeitsuitablefor analyzingandgettingdataintoshapesuchasintegratingandpackagingofdatausingsometools:ETL,DMT,andPig.Therearevariousactionsthatcanbeappliedinthereal-time format of data such as splitting of data, merging it, performing computations, connecting it with the outside data domain andspreadingdatatomultipledestinations.
4.3. Data storage & processing
It is the process of managing data storage. It performs activities in parallel to optimize the storage process. Data clustering,replication, and indexing are adequate activities that are significant to accomplishing the storage phase in big data management (Siddiqa et al., 2016).
It refers to how numerous types of data can be stored in different forms after collecting them from different sources. There are various useful tools for big data storage namely Hbase, NoSQL, Gluster, HDFS, and GFS (Gandomi & Haider, 2015; Pole & Gera, 2016). (Cheptsov & Koller, 2015) introduced an innovative approach to parallelism data-centric based application on the message passing interface. The Fig. 9 describes the big data storage for various platforms (Hu et al., 2014;Chen,2014).
Fig. 10. Classification of differentdataanalyticstechniques.
Bigdataanalytics
AdvancedBigDataAnalyticprocessreferstoanalyzeheterogeneousdataandmineinsightfulinformationthroughunknownpatterns by applying various predictive algorithms, semantic analysis, statistical analysis methods, and technologies. Collection andtransportation ofbig datashare acommon goal:- analyzingthedata for insightsandbetterapplication guidance(Li & Jain, 2013). (Fahad et al., 2014) described few efficient algorithms such as sampling, data condensation approaches, density-based approach, grid-based approach, divide and conquer, incremental learning and distributed computing. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996) presented the steps that composed with knowledge discovery in database process. They defined significant iterations such asselectionofdata,pre-processingofdata,thetransformationofdata,dataminingalgorithmsapplyingtoenumeratepatternsforproperinterpretationofresultsandtoensureusefulknowledgediscoveryfromdata.
Tsai, Lai, Chao, and &Vasilakos (2015) presented big data analytics of various infrastructures that are categorized in the manner
(i) Processing or Computing: Hadoop, Nvidia CUDA, or Twitter storm, (ii) Storing: Titan or HDFS, and (iii) Analyzing: MLPACK ormahout. There are some other tools such as Whiteboard, R, MATLAB, octave refer for (kilobyte to low megabyte); Numpy, Scipy,Weka,Blasreferfor(megabytetolowgigabyte);andHive,Mahout,Harna,Giraphreferfor(gigabytestoterabyte).
5.1. BDA techniques
The recent advancements in techniques and technologies have enabled many enterprises to handle big data efficiently. The data analytics techniques are machine learning, data mining, statistics, artificial neural network, extreme machine learning, natural language processing, and deep learning etc. The Fig. 11 shows the origin of BDA techniques. BDA has led to numerous technologies to perform an analytics. Overview of Big Data Analytics Machine learning tools are described in Appendix A.
RQ3: What are the main tools, techniques, and technologies of Big Data Analytics?
5.1.1. Advanced machine learning
Advancedmachinelearning(ML)analyticisanumbrellaactionthatdefines the selection of analytical technique to build a model for evaluation of an efficient result. By tradition, machine-learning research is divided into two categories: logical representations and statistical ones. Initially, it selects an input data technique to build a predictive model and generate model output or validate. The Fig. 8(b)showsthepredictivemodelforactivityiterativeprocessincludingbuild,explore,scale,report,andact.
Themostcommonpredictiveanalyzingtechniquesthatareusedforadvanceddataanalyticsuchasclassification, clustering, regression, association analyzes, graph analyzes and decision tree. The predictive data analytic applications are supervised ML and unsupervised ML algorithms. The supervised ML methods are self-learning models that represent relationship between a set of descriptive and a target feature based on historical examples. However, in supervised machine learning, the first one category is regression which includes linear regression, generalized linear model, ensemble methods, decision trees, neural networks. The Fig. 10 shows the classification of differentanalyticdatatechniques.
Classification:Topredictthecategoriesofinputdatafore.g.weatherattributesaresunny,windy,rainyetc.
Regression:Topredictnumericvaluee.g.priceofstocks.
Clustering:Toorganizesimilaritemsin-togroupse.g.groupingacompanyinsenior,adults,andteenagers.
Association Analyzes: To findinterestingrelationshipsbetweensetsofvariables.
Graph Analyzes: To use graphic structure to findconnectionsbetweenentities.
Decision Tree:Topredictmodelinginsightsofobjectivevariablesbylearningsimpledecisionrulesinferredfromthedatafea-tures.
Further,itconsistsclassification algorithms such as support vector machines, discriminate analysis, naive Bayes, and nearest neighbour. Unsupervised machine learning uses clustering techniques which include various models like as k-means clustering, k- medoids, fuzzy c-means, hierarchical, Gaussian mixture, neural networks,and hiddenMarkov model. There are used in various real-timeapplicationssuchasmedicaldiagnosis,stocktrading,energyloadforecasting,weatherforecastingetc.
5.1.2. Advanced statistics
Advanced statistics analytics is primarily based on various tools and techniques for collecting, analyzing and visualizing the resultfromthelargescaleofdata.Itincludesdifferentdomainofanalyticsthatderivestechniquesfromstatisticsanddata-drivenanalysisthatexecutesstatisticsalgorithm.Thestatisticaltechniquereferstoclusteredanalytics,dataminingandpredictivemodelingmethods.
5.1.3. Advanced data mining
The BD mining is the most challenging technique as compared to traditional data mining such as pattern discover and extraction.Data mining depends on techniques such as data statistics, machine learning methods and pattern recognition (Chen & Zhang, 2014). Multiple linear regression and logistic regression are also commonly used in data mining, which includes various algorithms such as k-means clustering, association analysis, and decision trees. Overview of big data analytics techniques and their applications area shown in Table 6.
5.2. Big data analytics & applications
Therearemanytechniquesthatcanbeusedtoanalyzebigdata.ThisworkpresentsvariousanalytictechnologiesareasinwhichBDAisapplicableasfollows.
5.2.1. Social analytics
Socialanalyticsisanimportantandgrowinganalyticsofreal-timedataanalytics.Itiscategorizedintosocialnetworks(e.g.,Facebook and LinkedIn),blogs (e.g.,BloggerandWord Press), micro blogs(e.g., TwitterandTumblr), socialnews (e.g.,Digg andReddit),socialbookmarking(e.g.,DeliciousandStumbleUpon),mediasharing(e.g.,InstagramandYouTube),wikis(e.g.,WikipediaandWikihow),question-and-answersites(e.g.,Yahoo!AnswersandAsk.com)andreviewsites(e.g.,Yelp,TripAdvisor)generalsites(Li, Chen, Wang, & Zhang, 2013)likeFacebook,Instagram,Foursquare,Twitter,andPinterest,whichproduceimmenseamountsunstructuredformofdata.
5.2.2. Mobile analytics
Personal mobile devices can be used as instruments to collect and monitor learning analytics towards self-regulation. It hasdiscovered existing unknown meaningful patterns and knowledgeable data from a few dozen terabytes to numerous petabytescomposed from mobile users at the network-level or the application-level (Yazti & Krishnaswamy, D. Z. 2014). There are some studies about mobile and ubiquitous learning analytics tools (Alsheikh, Niyato, Lin, Tan, & Han, 2016; Fulantelli, Taibi, & Arrigo, 2013)presentedascalableApacheSpark-basedframeworkfordeeplearninginmobilebigdataanalytics.
5.2.3. Living analytics
It is associated with the study of social and behavioral forms of individuals and societal groups. The domain of analytical socialscience is integrally using advances in storage and computing abilities to process readily in big data (Lazer et al., 2009). Severalcommon challenges of living analytics with big data include high volume, high velocity, high dimensionality, sparse data, and avarietyofdiversedatasourcesandformats,etc.
5.2.4. Video and visual analytics
Videoanalyticsistheresearchfield that addresses the scalable and reliable analysis of video data. The visual analytics is described as ‘‘the science of analytical reasoning facilitated by interactive visual interfaces”anditsgeneralgoalistogenerateinsightfromdata.Itisafascinatingbranchofbigdatainvestigationtoprovideanalyticalreasoningovercollaborativevisualinterfaces.
5.2.5. Text analytics
Itreferstotechniquesthatcanextractinformationfromtextualdata.Itcontainsstatistical,computationallinguistics,andma-chinelearning(Gandomi & Haider, 2015).Textanalyticsassistbusinessestoadaptlargevolumesofhuman-generatedtextintomeaningfulinsights,whichsupportsevidence-baseddecision-making.Broadlyspeaking,summarizationfollowstwoapproaches:the
Table6
OverviewofBigDataAnalyticstechniquesandtheirapplicationarea.
Name
Reviewpapers/Title
Reference
Applicationarea
Reference
Machine learning
Strategies and principles of distributed machine
(Xing, Ho, Xie, & Wei, 2016)
-Analyzing social networks Interpreting
(Airoldi, Blei, Fienberg, & Xing, 2008), (Chandola, Banerjee, &
learning on big data.
texts, images, and videos
Kumar, 2009), (Lee & Xing, 2012), (Zhao & Xing, 2014)
Machine Learning on Big Data: Opportunities and
(Zhou, Pan, Wang, & Vasilakos,
-Identifying disease and treatment
Challenges.
2017)
paths
A survey of machine learning for big data
(Qiu, Wu, Ding, Xu, & Feng,
-Tracking anomalous activity for cyber-
processing.
2016)
security
Extreme machine
Trends in extreme learning machines: A review.
(Huang, Huang, Song, & You,
-Computer vision
(He et al., 2014)
learning
2015)
Extreme learning machines: a survey.
(Huang, Wang, & Lan, 2011)
-Image processing
(An & Bhanu, 2012)
Extreme learning machine: algorithm, theory and
(Ding, Zhao, Zhang, Xu, & Nie,
-System modeling and prediction
(Tian & Mao, 2010)
applications.
2015)
medical/biomedical application
(You, Lei, Zhu, Xia, & Wang, 2013)
-Time series analysis
(Butcher, Verstraeten, Schrauwen, Day, & Haycock, 2013)
Artificial Neural
Artificial neural network learning: A comparative
(Sovilj, Sorjamaa, Yu, Miche, &
-Chemical engineering
(Zhang, 2000)
Network
review.
&Severin, 2010)
-Cancer prediction
(Himmelblau, 2000)
Artificial neural networks in business: Two
(Neocleous & Schizas, 2002)
-Disease Prediction
(Agrawal & Agrawal, 2015)
decadesofresearch.
Artificial neural networks and its applications.
(Tkáč & Verner, 2016).
-Agriculture
(Weng, Huang, & Han, 2016) (Francik et al., 2016)
Neural networks for classification: a survey.
(Jha, 2007).
Data Mining
Educational data mining: A survey and a data
(Peña-Ayala, 2014)
-Educational data mining
(Chaturvedi & Ezeife, 2012)
mining-based analysis of recent works.
-Business & Management
(Baker & Yacef, 2009)
Data mining techniques in social media: A survey.
(Injadat, Salo, & Nassif, 2016)
-Medical and Health
(Moss, Corsar, & Piper, 2012)
-Social Networks
(Alowibdi, Buy, Philip, & Stenneth, 2014)
Application of data mining techniques in
(Ngai, Xiu, & Chau, 2009)
-Wind energy systems
(Soman, Zareipour, Malik, & Mandal, 2010)
customer relationship management:
A literature review and classification.
-Biomedicine
(Phillips & Buchanan, 2001)
Data mining techniques and applications – A decade review from 2000 to 2011.
(Liao, Chu, & Hsiao, 2012)
-Finance
(Vavpetic, Novak, Grcar, Mozetic, & Lavrac, 2013)
Data mining and wind power prediction: A
(Colak, Sagiroglu, &
literature review.
Yesilbudak, 2012)
Deep Learning
Deep learning for visual understanding: A review.
(Guo et al., 2016)
-Image classification
(Krizhevsky, Sutskever, & Hinton, 2012)
-Object detection
(Hoffman et al., 2014)
Deep learning applications and challenges in big
(Najafabadi et al., 2015)
-Image retrieval
(Liu, Guo, Wu, & Lew, 2015)
data analytics.
-Semantic segmentation
(Dong, Chen, Yan, & Yuille, 2014)
A survey of deep neural network architectures and
(Liu et al., 2017)
-Human pose estimation
(Ouyang, Chu, & Wang, 2014)
their applications.
-Speech recognition
(Bengio, 2013)
Natural Language
Machine learning and natural language
(Marquez, 2000)
-Spelling and grammar checking
(Zuker & Sankoff, 1984)
Processing
processing.
A tutorial on techniques and applications for
(Hayes & Carbonell, 1983)
-information retrieval
(Saidi, Maddouri, & Nguifo, 2010)
naturallanguageprocessing.
extractive approach and the abstractive approach.In extractive approach, a summary is produced from the original text units. Incomparison,abstractiveapproachcontainsextractingsemanticinformationfromthetext.
5.2.6. Audio analytics
Audio analyticsanalyze and extract data from unstructuredaudio datasuch as humanspoken language and it is referred to asspeechanalytics(Gandomi & Haider, 2015). The benefits that can be achieved are summarized while using these techniques for specificapplicationareasofstorage,pre-processing,andanalysisetc.
RQ4: What are the main application areas of Big Data Analytics?
The various data analytics applications such as smart agriculture, smart healthcare, cyber-physical security, and smart cities arebrieflydescribedas.
5.2.7. Smart agriculture
As the technology rapidly spread in few decades, big data analytics is the key to fostering a new revolution in agriculture. It hasevolvedtechnologytosolvereal-worldproblemsbasedonhistoricaldata,machine-generateddataandreal-timestreamingdata.AgriculturalIoT generatesa large volume ofagriculturalinformation (Lee, Hwang, & Yoe, 2013). Agriculture firms are adoptingbigdata technologies with a promise to gain insights from the large amounts of heterogeneous data, to solve the problem of real-time,managedataincompleteness,andlackofpriorknowledge,andcapturingavarietyofdatainacomplexform.
Smart agriculture is beneficial ‘use case’ in big IoT data analytics. Sensors are the actors in the smart agriculture ‘use case’. These
areinstalledinfields to obtain data on the moisture level of soil, trunk diameter of plants, micro-climate condition, and humidity level, as well as to forecast weather. It passes through an IoT gateway and the internet to reach the analytics layer (Marjani et al., 2017). The analytics layer processes the data obtained from the sensor network to issue commands. Automatic climate control of harvesting, timely controlled irrigation and humidity control for fungus prevention are the examples of actions performed on the basis of big data analytics (Gubbi, Buyya, Marusic, & Palaniswami, 2013).
Kshetri, (2014) presented a case study of agriculture to get various benefits, opportunities, and threats by implementing BDA and suggested about soil status to farmers, extreme variations in the weather patterns, new ways of planting, topography. It also provides information regarding variable market condition. Jiang, Chen, Dong, and Wang (2013) predicted the difficulties in sensors for storageand analysis by applyinga large amountof data. So putting forward a distributedstorage based on DSM architectureand combinedwithagriculturePaaSplatformtoprovideservice.
Xie, Zhang, Sun, and Hao (2015) proposed a big data processing technology to obtain a hierarchy of agricultural information system from the following aspects: gathering, storing, analyzing, and visualization of agricultural big data. This paper described that how to deal with the flood of agricultural data from the view of the big data technologies using Map Reduce Tool. The BDA provides a new insight to give advance decision support to improve yield productivity, and avoid unnecessary costs related to chemical ferti- lizers and pesticides. Bendre, Thool, and Thool (2016) presented the differentsourcesofbigdataandtypesinprecisionagriculture,ICT-basede-Agriculture. Finally,theydiscussedrainfall predictionapplication usingsupervisedandunsupervisedmethodfordataprocessingandforecasting.
5.2.8. Smart healthcare
Big data analytics is an emerging revolution in healthcare and medical research for Research and Development (R&D), treatment,testing, and diagnosis for health management. As the healthcare associations are expanding day by day, because of increment in thequantityofpatients,thereisanexpansioninmedicationstobeutilizedfortheirrestorativetreatment.Inthisway,itmakeschal-lenges in storing, processing and analyzing. Hence, the demand of BDA is relevant in this field also. The Health-care organizations are critically applying ‘wearable real-time sensors’toanalyzethecurrent conditionofthepatient andtreatthemaccording totheir
correct diagnosis and provide medical treatment.
Therefore,duringdiagnosisandtreatment,thereisavastcollectionofdatasuchas:-structuredandunstructureddata,self-monitoringhealthdata,real-timesensordevices,images,videos,variousreports,anddocuments.Presently,therearedifferent healthcare systems such as: - health-care management, innovation drug discovery, face recognition, verification of signatures, fin- gerprint, and iris. The Fig. 12 shows the process of analyzing unstructured data in health organization (Wang, Kung, & Byrd, 2016). The Big Data in health-care maintains an information regarding patient such as case history, physician notes, Lab reports, X-ray reports, diet rule, list of doctors, and nurses in a specific hospital, national health register data, medicine and surgical instruments expiry date identificationbasedonRFIDdata.TheseorganizationsarefurtherdependingonBDtechnologytocollectdatafroma
patient to get more insights into care and treatment.
Moreover,data-analyticscreatesadedicatedCenterforHealthAnalyticsandinsightstoaddresstheincreasingdemandfromhospitals, clinics, and health professionals across the world. The new big data health-care platforms: - CHESS (Batarseh & Latif, 2016), EHR, LIMS, MQIC, CMS (Ward, Marsolo, & Froehle, 2014). The Big Data analytic is used to analyze health insurance claims and leverage big data to detect fraud, waste, and error (Srinivasan & Arunasalam, 2013). Dolin, Rogers, and Jaffe (2015) presentedtwocasestudiestopredictasthmainclinicaldocumentarchitecture(CDA)byusingBDAapproach.
5.2.9. Cyber-physical systems
The organization and government protect their sensitive information by using computer security networks. The Big Data is used tocollect,organize,andstorethedata.Aninformationtechnologyisappliedbycyberdefenderstoprotecttheirdataefficiently,detect
Fig.12.Bigdataanalyticsinhealthcaresector.
allmalware,andcyberattackers.Developersmustsynchronizeandmakehardwarecomponentcompatiblewithsoftwareapplica-tionsusingcomputernetworks,wiredorwirelesssensorsanddifferentoperatingsystem,dataformatsandanalyticsystem.
Hence,BDAplaysacriticalroletoovercometheseriousissuesaboutsecurity,privacyandthusauthenticatevariousorganizationstoaccessdata,gainacompleteinsightofbusiness.Theemergenceofcyber-physicalsystemscanbeusedforproduction,trans-portation,logistics,andothersectorstobringnewchallengesforsimulationandplanning,formonitoring,control,andinteractionwithmachineryordatausageapplications(Becker, 2016).
5.2.10. Smart cities
Smart city is a wide concept, which takes into account not only the physical structure but also human and social aspects. It utilizesseveraltechnologiestoexpandtheperformanceofhealth,transportation,energy,education,andwaterservicesleadingtoanad-vancedlevelofcomfortoftheircitizens.TheapplicationofBDAiseffective for data storage and processing to generate information for diverse environments such as Smart grid environment (SGE) (Zhou, Fu, & Yang, 2016), Smart City (Ortiz-Rangel, M. 2015; Strohbach, Ziekow, Gazis, & Akiva, 2015). Smart healthcare is used to predict or diagnose the early disease (Demirkan, 2013; Roy, Pallapa, & Das, 2007).
MostlythesurveypapersonBigDataAnalyticshavefocusedondiscussingtheopportunities,challenges,andarchitecture.Whereas,theconcepts,architecture,challenges,andnewfuturedirectionsofBigDataAnalyticsarebeingpresentedherewith.So,this study provides useful insights through the integration of various technologies used in the application of Big Data Analytics. DatasourcesandapplicationareasofbigdataanalyticsfordevelopmentareshowninTable 7.
Visualization
Abigdatavisualizationmethodisconcernedwiththedesignofagraphicalrepresentationintheformofatable,images,diagrams, and spontaneous display ways to understand the data. Visual analytics has potentially brought the new federation of datamining and machine learning tools. Visual perception, design, data quality, missing data, end-user visual analytics are future trends ofvisualization(Becker, 2016). There are various well-known visualization analytical tools such as Dive, Rattle, FlockDB, Orange (Pole & Gera, 2016), Flare, Amcharts, and Protovis. Recently, different companies such as Amazon, Twitter, Apple, Facebook, and Google are searching visualization tools for solutions that can provide useful insights from various business aspects (Simon, 2014). The Fig. 13 shows the evolution of visualization methodology (Song, 2014). These tools and methods are appearing in form of charts, graphs, histogram, box plots, excel spread-sheets, heat maps, geographical maps etc. Interpretation is tackled with the presentation and visualization of inferences drawn in a comprehensible manner. Two main mechanisms are often used to interpret big data: - visualization and modeling. The use of big data has significant implications for modeling and theory development from a statistical- scientific point of view (Ramannavar & Sidnal, 2016).
BDA&decision-makingframeworkforvalue-creation
InBigDataAnalyticsandDecision-MakingFramework(BDA-DMF)forvalue-creationmodel,theframeworkispresentedbywhichBDAcancreatevalueforfinancial & accounting companies. Framework for BDA and business insights for financialaccounting
Table7
Datasourcesandapplicationareasofbigdataanalyticsfordevelopment.
Analytics
Datatype
Medium
Applicationarea
References
Social analytics
-Movie revenues
-Websites, blogs
-Sentiment analysis of social data
(Asur, & Huberman, 2010)
Mobile analytics
-Call detail Records
-Cell phones
-Social Network Analysis, Population Mobility
(Laurila et al., 2012)
Patterns, Transportation System
-Planning, Awareness Campaigns, Mobile App.
-Usage Patterns, Mobile data, traffic Analysis
Living analytics
-Tweets and Comments
-Social Media Sites
-Social Network Analysis, Sentiment Analysis
(Technical report, 2014)
-Text
-The Internet
-Cultural Changes, Policy Effectiveness
-Personal Health Data
-Wearable's
-Healthcare
Visual analytics
-Images
-Sensors
-Weather Forecasting, Pollution Control,
(Centro de, 2015)
-Climate Variables, Temperature,
-Camera
Urban Planning
Pollutant Levels
Video analytics
-Anonymous
-Intel's Audience
-Market Research,
(Balkan & Kholod, 2015)
Audience Data
Impression Metrics (AIM) Suite,
-Public Security System
-Multimedia, Images
-Camera
-Automated security and surveillance Systems
(Xu, Mei, Hu, & Liu, 2016)
Text analytics
-Text Data
-Social network feeds, email, blogs, online forum, survey
-Stock market based, Sentiment analysis
(Gandomi & Haider, 2015)
responses,corporate documents, news and call-center logs.
-financial news
(Chung, 2014)
Audio analytics
-Voice (audio data)
-Human spoken Language
-Speech analytics, Customer call-center,
(Hirschberg, Hjalmarsson, &
Healthcare, Interactive Voice Response
Elhadad, 2010)
Smart agriculture
-Sensors, Text Data, Images,
-Documents, sensors device, GPS
-Watershed management analysis
(Hu, Cai, & DuPont, 2015)
Audio, Video
-Website
-Crop modeling, Irrigation Water Management,
(Wolfert et al., 2017)
Irrigation Scheduling
Smart health
-Health Related Databases
-Wearable devices, sensor data, machine generated
-Electronic Health Record Analysis, clinical
(Raghupathi, & Raghupathi, 2014)
decision support, disease surveillance,
Cyber-physical System
-Expert Databases
-Sensors, controller, networked manufacturing system
-CPS based Industry 4.0 Systems
(Lee, Bagheri, & Kao, 2014)
-CPS for TES Systems
(Lee, Jin, & Liu, 2017)
Smart cities
-Databases
-Smart phones, Computer,
-Transportation, Healthcare, Power grid, Smart
(Al Nuaimi, Al Neyadi, Mohamed, &
education, Energy
Al-Jaroodi, 2015)
-Environmental sensors,
-Cameras
-GeographicalPositioningSystems
Fig.13.Theevolutionofvisualizationmethodology.
isshowninFig. 14. This figureindicatesthethree-phasemethodbywhichbigdataanalyticscancreatevalueforcustomersand
firms:-
• Phase1 Value Discover: (i) Big Data Sources, (ii) Big Data Processing,
• Phase 2 Value Creation: (iii) Big Data Analytics, (iv) Big Data Analytics Capabilities,
Fig. 14. Framework for big data analytics and business insights for financialaccounting.
• Phase 3 Value Realization:(v)BigDataAnalytics-ValueInsights.
7.1. Value discover
Inthefirst phase, BDA can create new insights that improve business-driven decision-making. For example, BDA can show that how firms can improve customer satisfaction and the specific features of the service experience. The growing importance of big data as a company asset is driving the development of new ways to value data assets. In the past, customer databases were considered as an important asset for firms (Srivastava, Tasadduq, & Fahey, 1998). For example, these databases could be used to create stronger relationships with customers, achieve higher loyalty, and create more efficient and effective (cross)-selling techniques. Big data holds tremendous potential for financial services firms to develop new and innovative solutions that result in a significant business value. The BDA mainly focuses on three major values to discover by the implementation of big data technology, for example, minimize hardware costs, check the value of big data before committing significant company resources, and reduce processing costs (Leavitt, 2013). It requires congruence between business objectives, the big data storage and analytics approach. (Serrato & Ramirez, 2017) discussed three challenges for managers and decision-makers in order to take advantage of BDA. The firstistothinkcriticallyaboutanalyticstechniquesandtheanalyzesbasedonsuchdata,secondistoidentifyopportunitiesforcreatingvalueusingBD,andthe
third one is to estimate the value created while using BD to address an opportunity.
Stage 1 Data Sources: The creation of value discovers, facilitate ideas or insight to better decision-making for the big data model. The process began with the data input from various sources of data. For example, external users are investors, creditors, regulation, customer, and competitor etc and internal users are the owner, manager, and employee etc. Initially, pre-processing of the data conducted to clean and transform the data into meaningful Big Data. It results in the creation of knowledge for big data discovery. Stage 2 Big Data Management: On assessing the value of BDA to organizations, the key benefitsarerecognizedastimelyaccesstodecision-makinginformation,greatertransparency,scalability,andbetterchangemanagement.BigDataManagement(BDM)sys-temsareofgreatvaluethatcanmonitorandreporttheexactinformationauserwishestoanalyze.Clouderaisanexampleofan
“Analytics infrastructure”, which provides a Hadoop-based platform for execution of data analytics jobs in an enterprise environment. 10Gen provides “Operational infrastructure”forenterprise-gradedatabasesandmanagementbasedonMongoDBtechnology.
7.2. Value creation
In the second phase, the value-creation benefit of BDA is the development of more effective marketing campaigns by selecting the right customer. However, the classical definitions of marketing by (Armstrong, Adam, Denize, & Kotler, 2014) highlighted that marketing should focus on creating superior value for customers (through high quality, attractive brand propositions, and striving for an appropriate relationship), and that firms can capture value from customers in return of value creation. (Verhoef, Kooge, & Walk, 2016) defined value-to-firm and customer to firm'smetrics.
RQ5: What is the relation between value-creation and Big Data Analytics?
The value-creation is a major sustainability factor for companies, in addition to profit maximization, customer retention, business goals and revenue generation. The adoption of Internet of Things (IoT), big data, and cloud computing technologies by companies/ organizations has led to better value-creation at the customer and enterprise ends (Haile & Altmann, 2016). Such as enterprise applications are designed to collect direct customer feedback and information from internal business operations (Verhoef et al., 2016).
Adoptingbigdata analyticsasafirm-level innovation aims to achieve firm heterogeneity and hence affords higher value and contribute directly to the overall value-creation performance of banking firms, financial services, supply chain management, and IT companies etc. McKinsey & Co. added ‘Value’ as the fourth ‘V’ to define big data (Chen et al., 2014). Value refers to the worth of hidden insights inside big data. Value represents the transactional, strategic, and informational benefitsofbigdata.Moreover,it
represents the extent to which big data generates economically worthy insights and benefits through extraction and transformation (Wamba et al., 2015). Big data has the potential to transform almost every aspect of business from research and development to salesandmarketing,andsupply-chainmanagementthatprovidenewopportunitiesforgrowth.
Stage 3 Big Data Analytics: The process of data collection, processing and analysis for BDA is expected to play a key role in financial& accounting sector. Big Data has become a large pool of unstructured and structured data that can be captured, communicated,aggregated,stored,andanalyzedwhichisnowtreatedasapartofeverysectorandfunctionoftheglobaleconomy.
RQ6: Which are the specific aspects of the data management, data transformation, and utilization drive value for companies?
In today's digital landscape, data is more readily available and easily gathered, than ever before. With the ability to track mostcustomerinteractions,transactionsacrossdevicesandacrosschannels,companiesarelookingat,andleveraging,theirdatainnewinnovativeways.Dealingwithalargeamount ofinformation fromdifferent data sources is a concern of Big Data management. The issues like how to store, integrate and process data in an effective and efficient way have been pointed out by (Brereton et al., 2007).For storing and processing large datasets companies can use traditional parallel database systems, Apache Hadoop technologies, key-value data stores (Hbase, NoSQL databases) etc. Apache Spark, SparkR, Techyon, MLbase, mahout and Splunk are a relatively newtypeofanalyticaltools,whichisbecomingmoreandmorepopularmostlyamongwebcompaniestoday.
TheprocessofBDAframeworkhelpstodeterminetherightbusinessmodel.(i) Raw data: The companies which generate a rich pool of raw data can sell it with little investment, (ii) Processed data:Processeddatacomesfrommultiplesourcesthatarestored,
managed and analyzed for others to consume,(iii) Insights: Use of data science, predictive modeling, machine learning, and analytics help perform complex correlations on data and gain business insights, (iv) Presentation: The ability to present the data, insight andanalyticmodelstokeybusinesspartners,helpsthemtobuild
公开 最后更新: 2023-02-08 04:03:45 AM
