Free Databricks Databricks-Certified-Data-Engineer-Associate Exam Actual Questions

The questions for Databricks-Certified-Data-Engineer-Associate were last updated On May 4, 2024

Question No. 1

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

Show Answer Hide Answer
Correct Answer: D

In Delta Live Tables (DLT), when configured to run in Continuous Pipeline Mode, particularly in a production environment, the system is designed to continuously process and update data as it becomes available. This mode keeps the compute resources active to handle ongoing data processing and automatically updates all datasets defined in the pipeline at predefined intervals. Once the pipeline is manually stopped, the compute resources are terminated to conserve resources and reduce costs. This mode is suitable for production environments where datasets need to be kept up-to-date with the latest data.

Reference: Databricks documentation on Delta Live Tables: Delta Live Tables Guide


Question No. 2

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

Show Answer Hide Answer
Correct Answer: A

When migrating to Delta Live Tables (DLT) with a data pipeline that involves different programming languages across various data layers, the migration does not require unifying the pipeline into a single language. Delta Live Tables support multi-language pipelines, allowing data engineers and data analysts to work in their preferred languages, such as Python for data engineering tasks (raw, bronze, and silver layers) and SQL for data analytics tasks (gold layer). This capability is particularly beneficial in collaborative settings and leverages the strengths of each language for different stages of data processing.

Reference: Databricks documentation on Delta Live Tables: Delta Live Tables Guide


Question No. 3

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Show Answer Hide Answer
Correct Answer: B

To minimize the total running time of the SQL endpoint used in the refresh schedule of a dashboard in Databricks, the most effective approach is to utilize the Auto Stop feature. This feature allows the SQL endpoint to automatically stop after a period of inactivity, ensuring that it only runs when necessary, such as during the dashboard refresh or when actively queried. This minimizes resource usage and associated costs by ensuring the SQL endpoint is not running idle outside of these operations.

Reference: Databricks documentation on SQL endpoints: SQL Endpoints in Databricks


Question No. 4

A data engineer needs access to a table new_uable, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which approach can be used to identify the owner of new_table?

Show Answer Hide Answer
Correct Answer: D

To find the owner of a table in Databricks, one can utilize the Data Explorer feature. The Data Explorer provides detailed information about various data objects, including tables. By navigating to the specific table's page in Data Explorer, a data engineer can review the Owner field, which identifies the individual or role that owns the table. This information is crucial for obtaining the necessary permissions or for any administrative actions related to the table.

Reference: Databricks documentation on Data Explorer: Using Data Explorer in Databricks


Question No. 5

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

Show Answer Hide Answer
Correct Answer: A

To grant full privileges on a table such as 'sales' to a group like 'team', the correct SQL command in Databricks is:

GRANT ALL PRIVILEGES ON TABLE sales TO team;

This command assigns all available privileges, including SELECT, INSERT, UPDATE, DELETE, and any other data manipulation or definition actions, to the specified team. This is typically necessary when a team needs full control over a table to manage and manipulate it as part of a project or ongoing maintenance.

Reference: Databricks documentation on SQL permissions: SQL Permissions in Databricks