Metastore Admins have the highest administrative privileges within a Unity Catalog metastore. They can transfer ownership of any Unity Catalog object, including catalogs, schemas, tables, storage credentials, and external locations. Metastore Admins are also required to manage Delta Sharing configurations such as creating or transferring shares and recipients.
Account Admins, by contrast, only create metastores and cannot change ownership or manage Delta Sharing objects. Workspace Admins have privileges limited to workspace-level management, not cross-metastore access.
Reference Source: Databricks Unity Catalog Administration Guide – “Metastore admin privileges and ownership transfer.”
====================
QUESTION NO: 2
A data engineer is tasked with building a nightly batch ETL pipeline that processes very large volumes of raw JSON logs from a data lake into Delta tables for reporting. The data arrives in bulk once per day, and the pipeline takes several hours to complete. Cost efficiency is important, but performance and reliability of completing the pipeline are the highest priorities.
Which type of Databricks cluster should the data engineer configure?
A. A lightweight single-node cluster with low worker node count to reduce costs.
B. A high-concurrency cluster designed for interactive SQL workloads.
C. An all-purpose cluster always kept running to ensure low-latency job startup times.
D. A job cluster configured to autoscale across multiple workers during the pipeline run.
Answer: D
Job clusters are optimized for automated production workloads. They start when a job is triggered and terminate automatically once the task completes. This ensures cost control while maintaining performance and reliability for batch ETL. Autoscaling allows Databricks to add or remove workers dynamically based on workload size, ensuring large data volumes are processed efficiently.
All-purpose clusters are intended for development or ad-hoc workloads, not scheduled ETL.
Reference Source: Databricks Compute and Job Cluster Configuration Documentation – “Autoscaling and Job Clusters.”
====================
QUESTION NO: 10
A data engineer deploys a multi-task Databricks job that orchestrates three notebooks. One task intermittently fails with Exit Code 1 but succeeds on retry. The engineer needs to collect detailed logs for the failing attempts, including stdout/stderr and cluster lifecycle context, and share them with the platform team.
What steps the data engineer needs to follow using built-in tools?
A. Use the notebook interactive debugger to re-run the entire multi-task job, and capture step-through traces for the failing task.
B. Download worker logs directly from the Spark UI and ignore driver logs, as worker logs contain stdout/stderr for all tasks and cluster events.
C. Export the notebook run results to HTML; this bundle includes complete stdout, stderr, and cluster event history across all tasks.
D. From the job run details page, export the job's logs or configure log delivery; then retrieve the compute driver logs and event logs from the compute details page to correlate stdout/stderr with cluster events.
Answer: D
The recommended way to troubleshoot and collect detailed job logs is through the Job Run Details page in Databricks. From there, engineers can export run logs or configure automatic log delivery to a storage destination. The driver and event logs available under compute details provide stdout, stderr, and cluster lifecycle context required for root-cause analysis.
Reference Source: Databricks Jobs Monitoring and Logging Documentation – “Access driver logs and configure log delivery.”
====================
QUESTION NO: 19
A Data Engineer is building a simple data pipeline using Lakeflow Declarative Pipelines (LDP) in Databricks to ingest customer data. The raw customer data is stored in a cloud storage location in JSON format. The task is to create Lakeflow Declarative Pipelines that read the raw JSON data and write it into a Delta table for further processing.
Which code snippet will correctly ingest the raw JSON data and create a Delta table using LDP?
A.
import dlt
@dlt.table
def raw_customers():
return spark.read.format("csv").load("s3://my-bucket/raw-customers/")
B.
import dlt
@dlt.table
def raw_customers():
return spark.read.json("s3://my-bucket/raw-customers/")
C.
import dlt
@dlt.table
def raw_customers():
return spark.read.format("parquet").load("s3://my-bucket/raw-customers/")
D.
import dlt
@dlt.view
def raw_customers():
return spark.format.json("s3://my-bucket/raw-customers/")
Answer: B
The correct method to define a table using Lakeflow Declarative Pipelines (LDP) is with the @dlt.table decorator, which persists the output as a managed Delta table. When ingesting raw JSON data, spark.read.json() or spark.read.format("json").load() is the standard approach. This reads JSON-formatted files from the source and stores them in Delta format automatically managed by Databricks.
Reference Source: Databricks Lakeflow Declarative Pipelines Developer Guide – “Create tables from raw JSON and Delta sources.”
====================
QUESTION NO: 27
A platform team is creating a standardized template for Databricks Asset Bundles to support CI/CD. The template must specify defaults for artifacts, workspace root paths, and a run identity, while allowing a “dev” target to be the default and override specific paths.
How should the team use databricks.yml to satisfy these requirements?
A. Use deployment, builds, context, identity, and environments; set dev as default environment and override paths under builds.
B. Use roots, modules, profiles, actor, and targets; where profiles contain workspace and artifacts defaults and actor sets run identity.
C. Use project, packages, environment, identity, and stages; set dev as default stage and override workspace under environment.
D. Use bundle, artifacts, workspace, run_as, and targets at the top level; set one target with default: true and override workspace paths or artifacts under that target.
Answer: D
In Databricks Asset Bundles, the databricks.yml file defines all top-level configuration keys, including bundle, artifacts, workspace, run_as, and targets. The targets section defines specific deployment contexts (for example, dev, test, prod). Setting default: true for a target marks it as the default environment. Overrides for workspace paths and artifact configurations can be defined inside each target while keeping defaults at the top level.
Reference Source: Databricks Asset Bundle Configuration Guide – “Structure of databricks.yml and target overrides.”
====================
QUESTION NO: 31
A data engineer inherits a Delta table with historical partitions by country that are badly skewed. Queries often filter by high-cardinality customer_id and vary across dimensions over time. The engineer wants a strategy that avoids a disruptive full rewrite, reduces sensitivity to skewed partitions, and sustains strong query performance as access patterns evolve.
Which two actions should the data engineer take? (Choose 2)
A. Keep existing partitions and rely on bin-packing OPTIMIZE only; ZORDER and clustering are unnecessary for multi-dimensional filters.
B. Periodically run OPTIMIZE table_name.
C. Disable data skipping statistics to avoid maintenance overhead; rely on adaptive query execution instead.
D. Depend solely on optimized writes; Databricks will automatically replace partitioning with clustering over time.
E. Switch from static partitioning to liquid clustering and select initial clustering keys that reflect common filters such as customer_id.
Answer: B, E
Liquid Clustering replaces traditional partitioning and ZORDER optimization by automatically organizing data according to clustering keys. It supports evolving clustering strategies without requiring a full table rewrite. To maintain cluster balance and improve performance, the OPTIMIZE command should be run periodically. OPTIMIZE groups data files by clustering keys and helps reduce small file overhead.
Reference Source: Databricks Delta Lake Guide – “Use Liquid Clustering for Tables” and “OPTIMIZE Command for File Compaction and Data Layout.”
====================
QUESTION NO: 39
A data engineer needs to provide access to a group named manufacturing-team. The team needs privileges to create tables in the quality schema.
Which set of SQL commands will grant a group named manufacturing-team to create tables in a schema named production with the parent catalog named manufacturing with the least privileges?
A. GRANT CREATE TABLE ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT CREATE SCHEMA ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT CREATE CATALOG ON CATALOG manufacturing TO manufacturing-team;
B. GRANT CREATE TABLE ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT CREATE SCHEMA ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE CATALOG ON CATALOG manufacturing TO manufacturing-team;
C. GRANT CREATE TABLE ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE SCHEMA ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE CATALOG ON CATALOG manufacturing TO manufacturing-team;
D. GRANT USE TABLE ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE SCHEMA ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE CATALOG ON CATALOG manufacturing TO manufacturing-team;
Answer: C
To create a table within a schema, a principal must have CREATE TABLE on the schema, USE SCHEMA on that schema, and USE CATALOG on the parent catalog. This combination ensures the group has just enough privileges to create objects in that schema without excessive permissions like CREATE SCHEMA or CREATE CATALOG.
Reference Source: Databricks Unity Catalog Privilege Model – “Privileges Required to Create a Table.”