A data scientist wants each record in the DataFrame to contain:The first attempt at the...

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question Answer

A data scientist wants each record in the DataFrame to contain:

The first attempt at the code does read the text files but each record contains a single line. This code is shown below:

The entire contents of a file

The full file path

The issue: reading line-by-line rather than full text per file.

Code:

corpus = spark.read.text("/datasets/raw_txt/*") \

.select('*', '_metadata.file_path')

Which change will ensure one record per file?

Options:

Add the option wholetext=True to the text() function

Add the option lineSep='\n' to the text() function

Add the option wholetext=False to the text() function

Add the option lineSep=", " to the text() function

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF/Engine

Get 65% Discount on All Products, Use Coupon: "ac4s65"

28 of 55.