100% Free Real Updated Databricks-Certified-Professional-Data-Engineer Questions & Answers Pass Your Exam Easily [Q95-Q111]

Share

100% Free Real Updated Databricks-Certified-Professional-Data-Engineer Questions & Answers Pass Your Exam Easily

Easily To Pass New Databricks-Certified-Professional-Data-Engineer Verified & Correct Answers


The Databricks-Certified-Professional-Data-Engineer certification exam is a valuable credential for data engineers who work with Databricks. The certification demonstrates the candidate's expertise in Databricks technology and data engineering concepts. The certification also demonstrates the candidate's commitment to professional development and continuous learning.

 

NEW QUESTION # 95
The operations team is interested in monitoring the recently launched product, team wants to set up an email alert when the number of units sold increases by more than 10,000 units. They want to monitor this every 5 mins.
Fill in the below blanks to finish the steps we need to take
* Create ___ query that calculates total units sold
* Setup ____ with query on trigger condition Units Sold > 10,000
* Setup ____ to run every 5 mins
* Add destination ______

  • A. SQL, Alert, Refresh, email address
  • B. SQL, Job, SQL Cluster, email address
  • C. Python, Job, SQL Cluster, email address
  • D. Python, Job, Refresh, email address
  • E. SQL, Job, Refresh, email address

Answer: A

Explanation:
Explanation
The answer is SQL, Alert, Refresh, email address
Here the steps from Databricks documentation,
Create an alert
Follow these steps to create an alert on a single column of a query.
1.Do one of the following:
*Click Create in the sidebar and select Alert.
*Click Alerts in the sidebar and click the + New Alert button.
2.Search for a target query.
Graphical user interface, text, application Description automatically generated

To alert on multiple columns, you need to modify your query. See Alert on multiple col-umns.
3.In the Trigger when field, configure the alert.
*The Value column drop-down controls which field of your query result is evaluated.
*The Condition drop-down controls the logical operation to be applied.
*The Threshold text input is compared against the Value column using the Condition you specify.

Note
If a target query returns multiple records, Databricks SQL alerts act on the first one. As you change the Value column setting, the current value of that field in the top row is shown beneath it.
4.In the When triggered, send notification field, select how many notifications are sent when your alert is triggered:
*Just once: Send a notification when the alert status changes from OK to TRIGGERED.
*Each time alert is evaluated: Send a notification whenever the alert status is TRIGGERED regardless of its status at the previous evaluation.
*At most every: Send a notification whenever the alert status is TRIGGERED at a spe-cific interval. This choice lets you avoid notification spam for alerts that trigger of-ten.
Regardless of which notification setting you choose, you receive a notification whenever the status goes from OK to TRIGGERED or from TRIGGERED to OK. The schedule settings affect how many notifications you will receive if the status remains TRIGGERED from one execution to the next. For details, see Notification frequency.
5.In the Template drop-down, choose a template:
*Use default template: Alert notification is a message with links to the Alert configuration screen and the Query screen.
*Use custom template: Alert notification includes more specific information about the alert.
a.A box displays, consisting of input fields for subject and body. Any static content is valid, and you can incorporate built-in template variables:
*ALERT_STATUS: The evaluated alert status (string).
*ALERT_CONDITION: The alert condition operator (string).
*ALERT_THRESHOLD: The alert threshold (string or number).
*ALERT_NAME: The alert name (string).
*ALERT_URL: The alert page URL (string).
*QUERY_NAME: The associated query name (string).
*QUERY_URL: The associated query page URL (string).
*QUERY_RESULT_VALUE: The query result value (string or number).
*QUERY_RESULT_ROWS: The query result rows (value array).
*QUERY_RESULT_COLS: The query result columns (string array).
An example subject, for instance, could be: Alert "{{ALERT_NAME}}" changed status to
{{ALERT_STATUS}}.
b.Click the Preview toggle button to preview the rendered result.
Important
The preview is useful for verifying that template variables are rendered cor-rectly. It is not an accurate representation of the eventual notification content, as each alert destination can display notifications differently.
c.Click the Save Changes button.
6.In Refresh, set a refresh schedule. An alert's refresh schedule is independent of the query's refresh schedule.
*If the query is a Run as owner query, the query runs using the query owner's cre-dential on the alert's refresh schedule.
*If the query is a Run as viewer query, the query runs using the alert creator's cre-dential on the alert's refresh schedule.
7.Click Create Alert.
8.Choose an alert destination.
Important
If you skip this step you will not be notified when the alert is triggered.


NEW QUESTION # 96
How to determine if a table is a managed table vs external table?

  • A. Run SQL command SHOW TABLES to see the type of the table
  • B. Run IS_MANAGED('table_name') function
  • C. Run SQL command DESCRIBE EXTENDED table_name and check type
  • D. All external tables are stored in data lake, managed tables are stored in DELTA lake
  • E. All managed tables are stored in unity catalog

Answer: C

Explanation:
Explanation
The answer is Run SQL command DESCRIBE EXTENDED table_name and check type Example of External table Graphical user interface, text, application Description automatically generated

Example of managed table
Graphical user interface, text, application, Teams Description automatically generated


NEW QUESTION # 97
Data engineering team is required to share the data with Data science team and both the teams are using different workspaces in the same organizationwhich of the following techniques can be used to simplify sharing data across?
*Please note the question is asking how data is shared within an organization across multiple workspaces.

  • A. DELTA LIVE Pipelines
  • B. Unity Catalog
  • C. Use a single storage location
  • D. Data Sharing
  • E. DELTA lake

Answer: B

Explanation:
Explanation
The answer is the Unity catalog.
Diagram Description automatically generated

Unity Catalog works at the Account level, it has the ability to create a meta store and attach that meta store to many workspaces see the below diagram to understand how Unity Catalog Works, as you can see a metastore can now be shared with both workspaces using Unity Catalog, prior to Unity Catalog the options was to use single cloud object storage manually mount in the second databricks workspace, and you can see here Unity Catalog really simplifies that.
Diagram Description automatically generated with medium confidence

sorry for the inconvenience watermark was added because other people on Udemy are copying my questions and images.
duct features
https://databricks.com/product/unity-catalog


NEW QUESTION # 98
Data science team members are using a single cluster to perform data analysis, although cluster size was chosen to handle multiple users and auto-scaling was enabled, the team realized queries are still running slow, what would be the suggested fix for this?

  • A. Increase the size of the driver node
  • B. Use High concurrency mode instead of the standard mode
  • C. Setup multiple clusters so each team member has their own cluster
  • D. Disable the auto-scaling feature

Answer: B

Explanation:
Explanation
The answer is Use High concurrency mode instead of the standard mode,
https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-mode High Concurrency clusters are ideal for groups of users who need to share resources or run ad-hoc jobs.
Databricks recommends enabling autoscaling for High Concurrency clusters.


NEW QUESTION # 99
One of the team members Steve who has the ability to create views, created a new view called re-gional_sales_vw on the existing table called sales which is owned by John, and the second team member Kevin who works with regional sales managers wanted to query the data in region-al_sales_vw, so Steve granted the permission to Kevin using command GRANT VIEW, USAGE ON regional_sales_vw to [email protected] but Kevin is still unable to access the view?

  • A. Kevin needs owner access on the view regional_sales_vw
  • B. Kevin is not the owner of the sales table
  • C. Steve is not the owner of the sales table
  • D. Kevin needs select access on the table sales
  • E. Table access control is not enabled on the table and view

Answer: C

Explanation:
Explanation
Ownership determines whether or not you can grant privileges on derived objects to other users, since Steve is not the owner of the underlying sales table, he can not grant access to the table or data in the table indirectly.
Only owner(user or group) can grant access to a object
https://docs.microsoft.com/en-us/azure/databricks/security/access-control/table-acls/object-privileges#a-user-has Data object privileges - Azure Databricks | Microsoft Doc


NEW QUESTION # 100
What is the purpose of gold layer in Multi hop architecture?

  • A. Eliminate duplicate records
  • B. Data quality checks and schema enforcement
  • C. Optimizes ETL throughput and analytic query performance
  • D. Optimized query performance for business-critical data
  • E. Preserves grain of original data, without any aggregations

Answer: D

Explanation:
Explanation
Medallion Architecture - Databricks
Gold Layer:
1. Powers Ml applications, reporting, dashboards, ad hoc analytics
2. Refined views of data, typically with aggregations
3. Reduces strain on production systems
4. Optimizes query performance for business-critical data
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Sorry I had to add the watermark some people in Udemy are copying my content.


NEW QUESTION # 101
You are working on IOT data where each device has 5 reading in an array collected in Celsius, you were asked to covert each individual reading from Celsius to Fahrenheit, fill in the blank with an appropriate function that can be used in this scenario.
Schema: deviceId INT, deviceTemp ARRAY<double>

SELECT deviceId, __(deviceTempC,i-> (i * 9/5) + 32) as deviceTempF
FROM sensors

  • A. MULTIPLY
  • B. APPLY
  • C. FORALL
  • D. TRANSFORM
  • E. ARRAYEXPR

Answer: D

Explanation:
Explanation
TRANSFORM -> Transforms elements in an array in expr using the function func.
1.transform(expr, func)


NEW QUESTION # 102
A data engineer has ingested a JSON file into a table raw_table with the following schema:
1.transaction_id STRING,
2.payload ARRAY<customer_id:STRING, date:TIMESTAMP, store_id:STRING>
The data engineer wants to efficiently extract the date of each transaction into a table with the fol-lowing
schema:
1.transaction_id STRING,
2.date TIMESTAMP
Which of the following commands should the data engineer run to complete this task?

  • A. 1.SELECT transaction_id, payload[date]
    2.FROM raw_table;
  • B. 1.SELECT transaction_id, explode(payload)
    2.FROM raw_table;
  • C. 1.SELECT transaction_id, date
    2.FROM raw_table;
  • D. 1.SELECT transaction_id, date from payload
    2.FROM raw_table;
  • E. 1.SELECT transaction_id, payload.date
    2.FROM raw_table;

Answer: E


NEW QUESTION # 103
What steps need to be taken to set up a DELTA LIVE PIPELINE as a job using the workspace UI?

  • A. Select Workflows UI and Delta live tables tab, under task type select Delta live tables pipeline and select the notebook
  • B. Select Workflows UI and Delta live tables tab, under task type select Delta live tables pipeline and select the pipeline JSON file
  • C. Use Pipeline creation UI, select a new pipeline and job cluster
  • D. DELTA LIVE TABLES do not support job cluster

Answer: A

Explanation:
Explanation
The answer is,
Select Workflows UI and Delta live tables tab, under task type select Delta live tables pipeline and select the notebook.
Create a pipeline
To create a new pipeline using the Delta Live Tables notebook:
1.Click Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.
2.Give the pipeline a name and click to select a notebook.
3.Optionally enter a storage location for output data from the pipeline. The system uses a de-fault location if you leave Storage Location empty.
4.Select Triggered for Pipeline Mode.
5.Click Create.
The system displays the Pipeline Details page after you click Create. You can also access your pipeline by clicking the pipeline name in the Delta Live Tables tab.


NEW QUESTION # 104
Which of the following Auto loader structured streaming commands successfully performs a hop from the landing area into Bronze?

  • A. 1.spark\
    2..readStream\
    3..format("csv")\
    4..option("cloudFiles.schemaLocation", checkpoint_directory)\
    5..load("landing")\
    6..writeStream.option("checkpointLocation", checkpoint_directory)\
    7..table(raw)
  • B. 1.spark\
    2..readStream\
    3..format("cloudFiles")\
    4..option("cloudFiles.format","csv")\
    5..option("cloudFiles.schemaLocation", checkpoint_directory)\
    6..load("landing")\
    7..writeStream.option("checkpointLocation", checkpoint_directory)\
    8..table(raw)
    (Correct)
  • C. 1.spark\
    2..read\
    3..format("cloudFiles")\
    4..option("cloudFiles.format","csv")\
    5..option("cloudFiles.schemaLocation", checkpoint_directory)\
    6..load("landing")\
    7..writeStream.option("checkpointLocation", checkpoint_directory)\
    8..table(raw)
  • D. 1.spark\
    2..readStream\
    3..load(rawSalesLocation)\
    4..writeStream \
    5..option("checkpointLocation", checkpointPath).outputMode("append")\
    6..table("uncleanedSales")
  • E. 1.spark\
    2..read\
    3..load(rawSalesLocation) \
    4..writeStream\
    5..option("checkpointLocation", checkpointPath) \
    6..outputMode("append")\
    7..table("uncleanedSales")

Answer: B

Explanation:
Explanation
The answer is
1.spark\
2..readStream\
3..format("cloudFiles") \# use Auto loader
4..option("cloudFiles.format","csv") \ # csv format files
5..option("cloudFiles.schemaLocation", checkpoint_directory)\
6..load('landing')\
7..writeStream.option("checkpointLocation", checkpoint_directory)\
8..table(raw)
Note: if you chose the below option which is incorrect because it does not have readStream
1.spark.read.format("cloudFiles")
2..option("cloudFiles.format","csv")
3....
4...
5...
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Sorry I had to add the watermark some people in Udemy are copying my content.
A diagram of a house Description automatically generated with low confidence


NEW QUESTION # 105
How do you create a delta live tables pipeline and deploy using DLT UI?

  • A. Within the Workspace UI, click on SQL Endpoint, select Delta Live tables and create pipelinea and select the notebook with DLT code.
  • B. Under Cluster UI, select SPARK UI and select Structured Streaming and click create pipeline and select the notebook with DLT code.
  • C. There is no UI, you can only setup DELTA LIVE TABLES using Python and SQL API and select the notebook with DLT code.
  • D. Use VS Code and download DBX plugin, once the plugin is loaded you can build DLT pipelines and select the notebook with DLT code.
  • E. Within the Workspace UI, click on Workflows, select Delta Live tables and create a pipeline and select the notebook with DLT code.

Answer: E

Explanation:
Explanation
The answer is, Within the Workspace UI, click on Workflows, select Delta Live tables and create a pipeline and select the notebook with DLT code.
https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-quickstart.html


NEW QUESTION # 106
When defining external tables using formats CSV, JSON, TEXT, BINARY any query on the exter-nal tables caches the data and location for performance reasons, so within a given spark session any new files that may have arrived will not be available after the initial query. How can we address this limitation?

  • A. BROADCAST TABLE table_name
  • B. CLEAR CACH table_name
  • C. CACHE TABLE table_name
  • D. REFRESH TABLE table_name
  • E. UNCACHE TABLE table_name

Answer: D

Explanation:
Explanation
The answer is REFRESH TABLE table_name
REFRESH TABLE table_name will force Spark to refresh the availability of external files and any changes.
When spark queries an external table it caches the files associated with it, so that way if the table is queried again it can use the cached files so it does not have to retrieve them again from cloud object storage, but the drawback here is that if new files are available Spark does not know until the Refresh command is ran.


NEW QUESTION # 107
Which of the following is a true statement about the global temporary view?

  • A. A global temporary view is stored in a user database
  • B. A global temporary view is available only on the cluster it was created, when the cluster restarts global temporary view is automatically dropped.
  • C. A global temporary view persists even if the cluster is restarted
  • D. A global temporary view is available on all clusters for a given workspace
  • E. A global temporary view is automatically dropped after 7 days

Answer: B

Explanation:
Explanation
The answer is, A global temporary view is available only on the cluster it was created.
Two types of temporary views can be created Session scoped and Global
*A session scoped temporary view is only available with a spark session, so another notebook in the same cluster can not access it. if a notebook is detached and re attached the temporary view is lost.
*A global temporary view is available to all the notebooks in the cluster, if a cluster restarts global temporary view is lost.


NEW QUESTION # 108
Which of the following data workloads will utilize a silver table as its source?

  • A. A job that cleans data by removing malformatted records
  • B. A job that queries aggregated data that already feeds into a dashboard
  • C. A job that ingests raw data from a streaming source into the Lakehouse
  • D. A job that aggregates cleaned data to create standard summary statistics
  • E. A job that enriches data by parsing its timestamps into a human-readable format

Answer: D

Explanation:
Explanation
The answer is, A job that aggregates cleaned data to create standard summary statistics Silver zone maintains the grain of the original data, in this scenario a job is taking data from the silver zone as the source and aggregating and storing them in the gold zone.
Medallion Architecture - Databricks
Silver Layer:
1. Reduces data storage complexity, latency, and redundency
2. Optimizes ETL throughput and analytic query performance
3. Preserves grain of original data (without aggregation)
4. Eliminates duplicate records
5. production schema enforced
6. Data quality checks, quarantine corrupt data
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Sorry I had to add the watermark some people in Udemy are copying my content.
Purpose of each layer in medallion architecture


NEW QUESTION # 109
Which of the following statements are correct on how Delta Lake implements a lake house?

  • A. Delta lake uses a proprietary format to write data, optimized for cloud storage
  • B. Delta lake always stores meta data in memory vs storage
  • C. Using Apache Hadoop on cloud object storage
  • D. Delta lake stores data and meta data in computes memory
  • E. Delta lake uses open source, open format, optimized cloud storage and scalable meta data

Answer: E

Explanation:
Explanation
Delta lake is
* Open source
* Builds up on standard data format
* Optimized for cloud object storage
* Built for scalable metadata handling
Delta lake is not
* Proprietary technology
* Storage format
* Storage medium
* Database service or data warehouse


NEW QUESTION # 110
Which of the following SQL command can be used to insert or update or delete rows based on a condition to check if a row(s) exists?

  • A. MERGE INTO table_name
  • B. COPY INTO table_name
  • C. UPDATE table_name
  • D. INSERT IF EXISTS table_name
  • E. INSERT INTO OVERWRITE table_name

Answer: A

Explanation:
Explanation
here is the additional documentation for your review.
https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html
1.MERGE INTO target_table_name [target_alias]
2. USING source_table_reference [source_alias]
3. ON merge_condition
4. [ WHEN MATCHED [ AND condition ] THEN matched_action ] [...]
5. [ WHEN NOT MATCHED [ AND condition ] THEN not_matched_action ] [...]
6.
7.matched_action
8. { DELETE |
9. UPDATE SET * |
10. UPDATE SET { column1 = value1 } [, ...] }
11.
12.not_matched_action
13. { INSERT * |
14. INSERT (column1 [, ...] ) VALUES (value1 [, ...])


NEW QUESTION # 111
......

Free Databricks-Certified-Professional-Data-Engineer Exam Files Downloaded Instantly: https://www.actualtestsquiz.com/Databricks-Certified-Professional-Data-Engineer-test-torrent.html