Configuring Azure HDInsight : A Step-By-Step Guide
Step 1: We will learn how to create an Hadoop clusters, upload a CSV file and query the file using HIVE (a query language in Hadoop)
- You can have various cluster types like Hadoop, H base, and Storm. For this example, let’s select Hadoop! You can use Linux or Windows, like the Hadoop OS; you can also choose the Hadoop versions.
- In Cluster Tier, you can pick the Standard and Premium tiers. The Premium Tier is pricier and encompasses AD Integration and secure Hadoop, (Ranger)!
Step 2: Cluster Configuration
- You can create many various configurations. Hadoop are the traditional cluster.
- H base used is for Columnar NoSQL data, Storm used for stream Analytics for real-time processing Spark is for In-memory interactive queries and micro batch, stream processing interactive Hive is used for queries In-memory and caching R Server is mainly used for machine learning tasks!
Step 3: In a credential section, you will need login access and administer the cluster and another account to use SSH.
- SSH is a secure shell to administer remote Servers from a local file using the command line.
Step 4: HDInsight is stored in an Azure Storage Account; it’s then stored in a container. A container is kind of like a folder to store information in Azure. You can also specify the location to store. Usually, the location should be, close to, your local, location!
- In a Pricing is very important. More cores and RAM will increase the price!!! You have working nodes, which contains the data and information and the head nodes; which are used to host the services.
Step 5: Then Press View all to see all the different options:
Step 6: You need create resource group or create new one. There groups are used to a group resources to the make the administration easier and the press create:
Step 7: The Login with the credentials a created and press Log In
Step 8: In a Dashboard shows hardware of information like disk usage, node time, number of live nodes, memory and network usage:
Step 9: Go Query Tab and click Default. The table created by default hive sample table:
Step 10: Can query the customers csv file using the following query:
Step 11: In results you see the values of the csv File like if were table:
Step 11: That the MASE is installed, connect to the Azure Storage Account and blob container created in Step 4 and go to the hive folder:
Create and Configure Azure HDInsight
In our chapter about the amazing Poly Base thingy, we presented this super cool SQL Server 2024 feature to query CSV files stored in Azure Storage accounts. We mentioned that in PolyBase, hey, you can query data in Hadoop (HDInsight) using SQL Server. HDInsight is like, totally a very popular system in Azure that eventually you will, like, need to interact with if you use SQL Server. That is why we will, like, give an explanation for all the newbies out there about it, you know?
Contact Us