Which of the following code blocks returns about 150 randomly selected rows from the 1000-row...

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Full Course Access

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 View All Questions

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Question Answer

Which of the following code blocks returns about 150 randomly selected rows from the 1000-row DataFrame transactionsDf, assuming that any row can appear more than once in the returned

DataFrame?

transactionsDf.resample(0.15, False, 3142)

transactionsDf.sample(0.15, False, 3142)

transactionsDf.sample(0.15)

transactionsDf.sample(0.85, 8429)

transactionsDf.sample(True, 0.15, 8261)

Explanation:

Explanation

Answering this QUESTION NO: correctly depends on whether you understand the arguments to the DataFrame.sample() method (link to the documentation below). The arguments are as follows:

DataFrame.sample(withReplacement=None, fraction=None, seed=None).

The first argument withReplacement specified whether a row can be drawn from the DataFrame multiple times. By default, this option is disabled in Spark. But we have to enable it here, since the question asks for a row being able to appear more than once. So, we need to pass True for this argument.

About replacement: "Replacement" is easiest explained with the example of removing random items from a box. When you remove those "with replacement" it means that after you have taken an

item out of the box, you put it back inside. So, essentially, if you would randomly take 10 items out of a box with 100 items, there is a chance you take the same item twice or more times. "Without

replacement" means that you would not put the item back into the box after removing it. So, every time you remove an item from the box, there is one less item in the box and you can never take the

same item twice.

The second argument to the withReplacement method is fraction. This referes to the fraction of items that should be returned. In the QUESTION NO: we are asked for 150 out of 1000 items – a

fraction of 0.15.

The last argument is a random seed. A random seed makes a randomized processed repeatable. This means that if you would re-run the same sample() operation with the same random seed, you

would get the same rows returned from the sample() command. There is no behavior around the random seed specified in the question. The varying random seeds are only there to confuse you!

More info: pyspark.sql.DataFrame.sample — PySpark 3.1.1 documentation

Static notebook | Dynamic notebook: See test 1, QUESTION NO: 49 (Databricks import instructions)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF/Engine

Printable Format
Value of Money
100% Pass Assurance
Verified Answers
Researched by Industry Experts
Based on Real Exams Scenarios
100% Real Questions

buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 pdf

Get 65% Discount on All Products, Use Coupon: "ac4s65"

The code block displayed below contains an error.

Which of the following statements about data skew is incorrect?

Spring Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ac4s65

Which of the following code blocks returns about 150 randomly selected rows from the 1000-row...

The Answer Is:

Explanation:

Quick Links