Big Data Analysis using BigQuery on Cloud Computing Platform

End the age of digitalization, data generated from numerous online and offline sources in every second. The Data are having a considerable amount of size and several properties termed as Bigdata. It is challenging to store, manage process, analyze, visualize, and extract useful information from Bigdata using traditional approaches in local machines. To resolve this cloud computing platform is the solution. Cloud computing has high-level processing units, storage, and applications that do not depend on user devices' performance. Many users can access resources and demanded services remotely from the cloud on a pay-as-use basis. That is why users are not needed to buy and install costly resources locally. Some cloud services providers are Google, AWS, IBM, and Microsoft, and they have their Bigdata analyzing robust systems and products in a cost-efficient manner. There are many Cloud Service Providers (CSP's) having different services of Bigdata analyzing filed. However, we discuss in the paper about an excellent service BigQuery in the Data ware house product of Google to analyze and represent numerous samples of datasets in real-time for making the right decisions within a short time.


INTRODUCTION:
With the advancement of modern technologies, in every moment, data are produced through the internet, business organization, web applications, networking, artificial intelligence, scientific research, and social media very rapidly. Generally, these data types have vast amounts of volume, properties, and categories termed as Bigdata. The significant issues are managing, organizing, processing, presenting, storing, and analyzing Bigdata for decision making in future actions (Rahman and Hasibul, 2019).
Extracting data from multiple sources is challenging by traditional analyzing approaches because of cost, infrastructure, high processing units, sophisticated analysis tools, storages, and robust algorithms. In this case, the cloud platform is a remarkable solution to managing and handling Bigdata quickly and efficiently within a concise time (Khedekar and Tian, 2020).
The research's main goal is to analyze a large volume of structured or unstructured data defined as Bigdata in the Google cloud platform (GCP) using BigQuery by SQL commands. BigQuery is a product of Google that is a wholly overseen, petabyte-scale, cost-effective for analyzing data in less execution time.
BigQuery in a platform of cloud computing for research and educational purposes and recommends only how the platform can be utilized in data analysis. (Tomar and Tomar, 2018) present an overview of Bigdata and cloud computing integration from two sources, i.e., RedBus and Twitter. This paper only discusses the data analysis framework and some methods but does not present a better analysis process in detail. (Kotecha and Joshiyara, 2018) present a method of managing and handling non-rational data on BigQuery and calculating the processing time. This paper only covers the analysis time with the dataset's size using Google SDK rather than extracting the taken dataset's necessary values. (Harsha, 2017) discuss some important roles and analyzing tools of Bigdata regarding cloud computing technology. However, These papers have not represented any analyzing methods of Bigdata in a clear view. (Bathla, 2018) presents a theoretical discussion on Bigdata management tools in the cloud platform rather than analyzing the non-structural data within a concise time. (Riahi and Riahi, 2018) discuss the fundamentals of Bigdata, its challenges, and its applications of it in data analytics. The paper covers Hadoop use, an open-source framework to manage data from different sources and analyze it. Our presented papers highlight cloud-based Bigdata analysis techniques using the BigQuery service of google without the infrastructural development and database administrators. BigQuery has the option of SQL commends in an easy optionof extraction the useful information from Bigdata in realtime.

Bigdata
Nowadays, a large volume of data produced from several offline and online sources every second. These data refer to Bigdata. It is difficult to store, process, and analyze Bigdata through traditional database technologies. Bigdata is indistinct and requires substantial processes of classification and conversion knowledge into new insights. Gartner defined Bigdata as high volume, high velocity, and a wide variety of information assets that required new forms of processing to enable enhanced decision making; insight discovery also processes optimazation" (Harjinder, 2019).

Categories of Bigdata
We can classify Bigdata according to these five aspects (a) data sources, (b) content format, (c) data stores, (d) data staging, and (e) data processing (Hashem et al., 2015).

Bigdata Analytics
Analytics of Bigdata is a methodology used to investigate epic data sets containing grouped characteristics of data sorts, such as gigantic data, to uncover every covered model, dark relationship, promote float, customer tendencies, and other steady business information. These definite results could provoke new pay openings, improved customer advantage, upgraded operational expertise, and high grounds over competitor affiliations and distinctive business reimbursement. Analytics may be categorized into the following types: 1. Descriptive analytics: The most direct class of examination, one that grants you to merge big data into humbler, more important pieces of information. 2. Predictive analytics: The accompanying stepup in the decrease of information utilizes a grouping of quantifiable, showing, data mining, and AI procedures to concentrate later and apparent data, allowing specialists to make estimates about what is to come. 3. Prescriptive analytics: Generally, it can be explained as a prescient investigation and need to underwrite a movement so that the business chief can take this information and act (Memon et al., 2017).

Bigdata Applications
Bigdata has many vital applications in the field of technology. Major applications are included below (Memon et al., 2017).
a) The Third Eye-Data idea b) In Banking c) In Agriculture d) In Finance e) In Economy f) Manufacturing g) Bioinformatics, etc.

Cloud Computing
It is a technology that cans support enormous resources by the requirement of users online in largescale parameters (Harjinder, 2019). Cloud computing is a paradigm for allowing omnipresent, easy, on-demand network access to a common pool of configurable computing resources (e.g., networks, servers, storage, software, and services) that can be easily provisioned and released with minimal main-tenance effort or interference amongst service providers. Five essential characteristics, service models, and four deployment models compose this cloud model (Pritzker, 2011 In the digital world, the amount of Bigdata increased very rapidly. It is a significant issue for managing, processing, storing, and analyzing Bigdata by the hardware and applications in local machines. We can resolve the issue by using a cloud computing platform. Some cloud services providers are Google, AWS, IBM, and Microsoft, and they have their Bigdata analyzing grobust systems and products in a cost-efficient manner (Islam and Reza, 2019). However, Google Cloud Computing (GCP) is a com-pletely managed platform that provides excellent services to business users.

Google Cloud Computing (GCP)
GCP has powerful tools for managing, storing, and efficiently analyzing Bigdata by reducing money and time. BigQuery is a product of Google in Data ware house that used analyzing and representing numerous types of sample data sets in real time for making the right decisions in the case of industries or business purposes (Saif and Wazir, 2018). Big-Query, Cloud Dataflow, Cloud Datalab, Google Cloud Dataproc, Cloud Datalab, Cloud Dataflow, Cloud Pub/Sub, and Google Genomics are the main data analyzing Google services. Among the above services, BigQuery is as nerveless, user-friendly low-cost Data ware house for analytic (Kumar, 2016).

BigQuery
BigQuery is the fully controlled data in the cloud platform. The ware house allows carrying out substantial economic queries-data amounts at speeds one would expect from Google. Taking advantage of low pricing and Google's world-class scalability and protection infrastructure provides business insights with strength (Kotecha and Joshiyara, 2018). Big-Query is a petabyte-scale, one of the fastest data warehouse solutions for Bigdata analysis. Without infrastructure and database administrator, one can easily query, represent, and analyze Bigdata as similar SQL commends by BigQery. Hence, most institutions and business organizations are used, from startups to Fortune 500 companies (Kumar, 2016). Fig 3 shows the data sources that are integrated into BigQuery (Bussiness2Community, 2020).

Bigdata Integration into Cloud
Bigdata and cloud computing and are very closely interrelated. We cannot think about analyzing of Bigdata in our local machine considering processing time and environmental setup. Besides data in different forms and sources, it is not easy to extract useful information for decision-making. The following Fig 4 shows the basic architecture of integration of Bigdata into cloud computing from multiple sources (Tomar & Tomar, 2018).

Case Study
We will focus the studies for Bigdata on google cloud. We consider this by a problem statement for the case of dataset 01(ted_main.csv) and dataset 02 (appstore_games.csv). You can load any dataset of the following formats in Google cloud: Explanation of the steps: We create a project on Google Cloud Platform and use the service of BigQuery. It is possible to access publicly available datasets and queries through structured query language (SQL) to see various outputs and data processing speed in BigQuery's data ware house.
1. Accessing publicly available sample data sets in BigQuery Data ware house: a) Click on product and services b) Click on a product category of Big-Query c) Click on bigquery-public-data-sets 2. Browsing publicly available datasets and running some queries with the query editor. 3. After clicking on the tables, for example, Wikipedia and natality, one can see metadata about the table. Metadata represents information about data. In Fig 5 and Fig 6,    In the following section, data sets of Wikipedia and natality are taken that are publicly available. Data sets are uploaded to BigQuery Data ware house, and then queries are executed to display the above results. This section aims to find a publicly available dataset, upload it into BigQuery Data ware house and then run a query to find out the result. We take a sample dataset TED talks collected from www.kaggle.com in the format of CSV. These datasets provide metadata on all TED Talk audiovideo recordings posted to TED.com's official website until March 2020 (TED, 2020). This dataset downloaded has information about all the recordings which were uploaded on YouTube at different times. TED stands for "Technology, Entertainment, and Design" as a media company that publishes free dissemination talks online under the slogan "ideas worth spreading" (WIKIPEDIA, 2020). Using BigQuery in Data ware house and the following steps were performed to achieve the desired result for the case of data set 01 (ted_main.csv).
1. Finding the Datasets After some research on google, a website named www.kaggle.com was found to have multiple publicly available datasets. There are two steps needed for dataset download: login to www.kaggle.com by an email id and password. Following the URL, a CSV file with all Ted Main Dataset records was downloaded on the local computer. https://www.kaggle.com/rounakbanik/tedtalks.

Need to upload the set of data to BigQuery
Data ware house a) Logging into BigQuery on the below URL:https://bigquery.cloud.google.com/da taset/bigquery-256708:BigData_on_cloud b) Creating new datasets in BigQuery After logging into BigQuery, clicked on the new project (Fig 7). It can be seen that the drop-down menu from 'creative projects' highlights a few options, and the first option is to create a new dataset. The creation of a dataset is a process to upload data on BigQuery Data ware house. For Dataset: 01 ted_main.csv In Fig 8, more details are added for table creation based on available source data, i.e., CSV file termed as ted_main.csv is uploaded from the local computer. In the next row, after setting the table name, click on the table button on the top area of the page is clicked to create the table in BigQuery Data ware house. Following the above steps data set exists on BigQuery. The next step is to upload data sources on BigQuery Data ware house.
In Fig 9, the file path is given, which was downloaded from www.kaggle.com.

Querying table in the editor
The table is ready for query and finding the top 2000 topics viewed by maximum count. This is achieved as per the below query in Fig 11. Fig 10 shows a preview of the taken dataset.

RESULTS AND DISCUSSION:
This study highlights how easy it is to start the analytics of Bigdata in the field of the cloud. Datasets may be collected from various sources. In our study, data samples are taken from the publicly available datasets on the website www.kaggle.com. The data sets of CSV formed tonate as ted_main was and app store_games. During the upload process of the CSV file, a Moreover, for the dataset appstore_games, a SQL query displayed the top 1500 games' results, counting maximum user ratings. In both cases, after querying the data, we saved it to Google sheets for the graphical presentation from where we easily observe the query result within the defined parameters. It is observed from the study on the cloud platform that it is relatively quick and easy. The main analyzing product of Google BigQuery is used here to have its smooth managing and handling capability. The outcome of this study is analyzing Bigdata (structured, semi-structured, and nonstructured) in a real-life scenario. Data from several sources are processed and represented costeffectively without infrastructure development and database administrators.

CONCLUSION AND FUTURE WORK:
Data is an important capital for firms, organizations, and other business areas in technology. The processing of data is so essential to an organization or industry to take the right decision instantly. However, processing and storing a massive amount of different data types (which are in the form of text, audio, and video, etc.) using traditional techniques and methods are so complicated, time-consuming, and costly. Moreover, the traditional servers and database have some limitations in handling these data categories efficiently, which is why evolution to cloud computing began. In our paper, we use the Bigquery service product of google Datawarehouse for solving the issue. Data (structured, semi-structured, and non-structured) from real-time sources is uploaded to the google cloud platform and using BigQuery we can instantly extract necessary information from the data. Moreover, BigQuery is serverless,cost-effective, and easily handled. Without infrastructure development and administrator, we can query, analyze and represent Bigdata within a few seconds.
The paper's main outcome is a representation of big data from a different perspective to instantly take action with the respect of organizations, industry, or any business area. This research will be carried out in the future via a more significant number of experimental datasets. The work will be expanded to BigQueryGIS GIS, rooted in spatial science, incorporates multiple data types. The spatial location is analyzed, and information layers are structured into visualizations using maps and 3D scenes.

ACKNOWLEDGEMENT:
Firstly, I acknowledge Almighty Allah's help because it was impossible to be done without the help of Allah. Furthermore, thank the co-authors and my honorable teachers, the Department of Information and Communication Engineering, Pabna University of Science and Technology (PUST), for supervised me and giving me the proper support to complete the research work.

CONFLICTS OF INTEREST:
The authors declare that they do not have competing interests regarding the publication of the paper.