To ensure that each customer's sales data is written to its own Parquet file within the Parquet folder structure, which data pipeline configuration should you implement?

Prepare for the DP-600 Fabric Analytics Engineer Exam. Study with flashcards and multiple choice questions, each offering hints and detailed explanations. Enhance your chances of success on the exam!

Multiple Choice

To ensure that each customer's sales data is written to its own Parquet file within the Parquet folder structure, which data pipeline configuration should you implement?

Explanation:
Partitioning organizes output into separate folders based on the values of a chosen column. By partitioning the sales data on the customer identifier stored in the fact table, the pipeline writes all rows belonging to a specific customer into a distinct path like customer_id=123/, resulting in separate Parquet files for each customer. This directly meets the goal of having each customer's sales data written to its own Parquet file within the folder structure. Partitioning by the date identifier would group data by date, not by customer, so different customers would share the same date partition and their data wouldn’t be isolated into individual customer files. Partitioning by a dimension table isn’t a standard way to organize the actual data files for per-record access, and not partitioning at all would keep everything in a single or shared set of files, failing to give each customer their own file.

Partitioning organizes output into separate folders based on the values of a chosen column. By partitioning the sales data on the customer identifier stored in the fact table, the pipeline writes all rows belonging to a specific customer into a distinct path like customer_id=123/, resulting in separate Parquet files for each customer. This directly meets the goal of having each customer's sales data written to its own Parquet file within the folder structure.

Partitioning by the date identifier would group data by date, not by customer, so different customers would share the same date partition and their data wouldn’t be isolated into individual customer files. Partitioning by a dimension table isn’t a standard way to organize the actual data files for per-record access, and not partitioning at all would keep everything in a single or shared set of files, failing to give each customer their own file.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy