New Year Special - 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ac75sure

An engineer has two DataFrames: df1 (small) and df2 (large).

An engineer has two DataFrames: df1 (small) and df2 (large). A broadcast join is used:

python

CopyEdit

frompyspark.sql.functionsimportbroadcast

result = df2.join(broadcast(df1), on='id', how='inner')

What is the purpose of using broadcast() in this scenario?

Options:

A.

It filters the id values before performing the join.

B.

It increases the partition size for df1 and df2.

C.

It reduces the number of shuffle operations by replicating the smaller DataFrame to all nodes.

D.

It ensures that the join happens only when the id values are identical.

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF/Engine
  • Printable Format
  • Value of Money
  • 100% Pass Assurance
  • Verified Answers
  • Researched by Industry Experts
  • Based on Real Exams Scenarios
  • 100% Real Questions
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 pdf
Get 75% Discount on All Products, Use Coupon: "ac75sure"