New Year Special - 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ac75sure

48 of 55.

48 of 55.

A data engineer needs to join multiple DataFrames and has written the following code:

from pyspark.sql.functions import broadcast

data1 = [(1, "A"), (2, "B")]

data2 = [(1, "X"), (2, "Y")]

data3 = [(1, "M"), (2, "N")]

df1 = spark.createDataFrame(data1, ["id", "val1"])

df2 = spark.createDataFrame(data2, ["id", "val2"])

df3 = spark.createDataFrame(data3, ["id", "val3"])

df_joined = df1.join(broadcast(df2), "id", "inner") \

.join(broadcast(df3), "id", "inner")

What will be the output of this code?

A.

The code will work correctly and perform two broadcast joins simultaneously to join df1 with df2, and then the result with df3.

B.

The code will fail because only one broadcast join can be performed at a time.

C.

The code will fail because the second join condition (df2.id == df3.id) is incorrect.

D.

The code will result in an error because broadcast() must be called before the joins, not inline.

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF/Engine
  • Printable Format
  • Value of Money
  • 100% Pass Assurance
  • Verified Answers
  • Researched by Industry Experts
  • Based on Real Exams Scenarios
  • 100% Real Questions
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 pdf
Get 75% Discount on All Products, Use Coupon: "ac75sure"
Next
Previous