48 of 55.

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question Answer

48 of 55.

A data engineer needs to join multiple DataFrames and has written the following code:

from pyspark.sql.functions import broadcast

data1 = [(1, "A"), (2, "B")]

data2 = [(1, "X"), (2, "Y")]

data3 = [(1, "M"), (2, "N")]

df1 = spark.createDataFrame(data1, ["id", "val1"])

df2 = spark.createDataFrame(data2, ["id", "val2"])

df3 = spark.createDataFrame(data3, ["id", "val3"])

df_joined = df1.join(broadcast(df2), "id", "inner") \

.join(broadcast(df3), "id", "inner")

What will be the output of this code?

The code will work correctly and perform two broadcast joins simultaneously to join df1 with df2, and then the result with df3.

The code will fail because only one broadcast join can be performed at a time.

The code will fail because the second join condition (df2.id == df3.id) is incorrect.

The code will result in an error because broadcast() must be called before the joins, not inline.

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF/Engine

Get 65% Discount on All Products, Use Coupon: "ac4s65"

55 of 55.

41 of 55.