Write us what you want & improve the DevOpsCloud website for easy to use.

To stop spammers/bots in Telegram, we have added a captcha while joining the Telegram group, which means every new member, has to authenticate within 60 seconds while joining the group.

Home >>All Articles

Published Articles (117)

Sort by:
  • All |
  • AWS (52) | 
  • Azure (31) | 
  • DevOps (9) | 
  • FREE Udemy Courses (6) | 
  • GCP (1) | 
  • Linux (1) | 

AVR posted:
2 years ago
What is Databricks & What do you know about Databricks?

Databricks is a new analytics service.
Azure databricks is a fast, easy, scalable, and collaborative apache-spark based analytics service on azure.

Why do we call it Fast? Because it uses a spark cluster
Why do we call it Easy? - We don't need any eclipse like PyCharm/Visual Studio to write the code
Why do we call it Scalable? - Dynamic allocation of the resources as per the requirement(nodes) is possible - We always need more nodes to process more data in databricks.
What is collaborative? - Data engineers/Data scientists/business users can work in Databricks notebook as collaborative work. Instead of working isolated, they all work in Databricks to achieve better productive work.
We can seamlessly connect from Databricks to other azure services(datalake/blob storage account/SQL server/azure synapse). Reduces cost and complexity with a managed platform that auto-scaled up and down

Let's understand more about Azure Databricks Architecture
Once Databricks Workspace is created, we have the flexibility to create clusters
We also have a flexibility option to upload the data via DBFS though this is NOT recommended at the enterprise level considering the security as a high priority.
DBFS is a databricks file system.
When we store the data internally via DBFS, it gets stored backend in the storage account depending on the cloud we choose(AWS/AZURE).
If we choose AWS, then EC2 Instance would spin up and data gets stored internally at AWS S3.
If we choose AZURE then VM would spin up and data gets stored internally at the Blob storage account.
Databricks knows all the dependencies at the time of workspace creation. It creates all the pre-requisites that are needed for Databricks workspace.
Databricks cluster is nothing but a group of VMs.
When we create a cluster, VMs get created at the backend in the Azure Portal.
In order to run the notebook, we need to have a databricks cluster in place. We need to attach the notebook to the cluster to run the notebook where the notebook code gets executed.

Databricks has got 2 options. One is auto-scaling and the other one is to Terminate the cluster due to cluster inactivity. These two options are very helpful to reduce the cost.
View replies (0)
Posted in: Azure | ID: Q90 |
April 18, 2022, 11:19 AM | 0 Replies