Explanation
This is QUESTION NO: is hard. Let's assess the different answers one-by-one.
Spark will only broadcast DataFrames that are much smaller than the default value.
This is correct. The default value is 10 MB (10485760 bytes). Since the configuration for spark.sql.autoBroadcastJoinThreshold expects a number in bytes (and not megabytes), the code block sets
the limits to merely 20 bytes, instead of the requested 20 * 1024 * 1024 (= 20971520) bytes.
The command is evaluated lazily and needs to be followed by an action.
No, this command is evaluated right away!
Spark will only apply the limit to threshold joins and not to other joins.
There are no "threshold joins", so this option does not make any sense.
The correct option to write configurations is through spark.config and not spark.conf.
No, it is indeed spark.conf!
The passed limit has the wrong variable type.
The configuration expects the number of bytes, a number, as an input. So, the 20 provided in the code block is fine.