Explanation
Answering this QUESTION NO: correctly depends on whether you understand the arguments to the DataFrame.sample() method (link to the documentation below). The arguments are as follows:
DataFrame.sample(withReplacement=None, fraction=None, seed=None).
The first argument withReplacement specified whether a row can be drawn from the DataFrame multiple times. By default, this option is disabled in Spark. But we have to enable it here, since the question asks for a row being able to appear more than once. So, we need to pass True for this argument.
About replacement: "Replacement" is easiest explained with the example of removing random items from a box. When you remove those "with replacement" it means that after you have taken an
item out of the box, you put it back inside. So, essentially, if you would randomly take 10 items out of a box with 100 items, there is a chance you take the same item twice or more times. "Without
replacement" means that you would not put the item back into the box after removing it. So, every time you remove an item from the box, there is one less item in the box and you can never take the
same item twice.
The second argument to the withReplacement method is fraction. This referes to the fraction of items that should be returned. In the QUESTION NO: we are asked for 150 out of 1000 items – a
fraction of 0.15.
The last argument is a random seed. A random seed makes a randomized processed repeatable. This means that if you would re-run the same sample() operation with the same random seed, you
would get the same rows returned from the sample() command. There is no behavior around the random seed specified in the question. The varying random seeds are only there to confuse you!
More info: pyspark.sql.DataFrame.sample — PySpark 3.1.1 documentation
Static notebook | Dynamic notebook: See test 1, QUESTION NO: 49 (Databricks import instructions)