Pre-Summer Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ac4s65

Which approach demonstrates a modular and testable way to use DataFrame.

Which approach demonstrates a modular and testable way to use DataFrame.transform for ETL code in PySpark?

A.

class Pipeline:

def transform(self, df):

return df.withColumn( " value_upper " , upper(col( " value " )))

pipeline = Pipeline()

assertDataFrameEqual(pipeline.transform(test_input), expected)

B.

def upper_value(df):

return df.withColumn( " value_upper " , upper(col( " value " )))

def filter_positive(df):

return df.filter(df[ " id " ] > 0)

pipeline_df = df.transform(upper_value).transform(filter_positive)

C.

def upper_transform(df):

return df.withColumn( " value_upper " , upper(col( " value " )))

actual = test_input.transform(upper_transform)

assertDataFrameEqual(actual, expected)

D.

def transform_data(input_df):

# transformation logic here

return output_df

test_input = spark.createDataFrame([(1, " a " )], [ " id " , " value " ])

assertDataFrameEqual(transform_data(test_input), expected)

Databricks-Certified-Professional-Data-Engineer PDF/Engine
  • Printable Format
  • Value of Money
  • 100% Pass Assurance
  • Verified Answers
  • Researched by Industry Experts
  • Based on Real Exams Scenarios
  • 100% Real Questions
buy now Databricks-Certified-Professional-Data-Engineer pdf
Get 65% Discount on All Products, Use Coupon: "ac4s65"