Skip to content

Week 9

In this lesson, we will learn about real-time data streams, message systems, and transactions in distributed systems.

Objectives

After completing this week, you should be able to:

  • Implement scalable stream processing in Spark
  • Explain different approaches to transactions in distributed systems and the associated trade-offs

Readings

  • Read chapter 11 in Designing Data-Intensive Applications
  • (Optional) Read chapters 8 in Designing Data-Intensive Applications

Weekly Resources

Assignment 9

In the second part of the exercise, you will create two streaming dataframes using the accelerations and locations folders.

Assignment 9.1

Start by creating a simple Spark Streaming application that reads data from the accelerations and locations topics and uses the Kafka sink to save the results to the LastnameFirstname-simple topic.

Assignment 9.2

Define a watermark on the locations dataframe using the timestamp column. Set the threshold for the watermark at "30 seconds". Set a window of "15 seconds" and compute the mean speed of each ride defined by the ride_id. Save the results in LastnameFirstname-windowed and set the output mode to update.

Assignment 9.3

Join the two streams together on the ride_id as an inner join. Save the results in LastnameFirstname-joined.

Submission Instructions

For this assignment, you will submit a zip archive containing the contents of the dsc650/assignments/assignment09/ directory. Use the naming convention of assignment09_LastnameFirstname.zip for the zip archive. You can create this archive in Bash (or a similar Unix shell) using the following commands.

cd dsc650/assignments
zip -r assignment09_DoeJane.zip assignment08

Likewise, you can create a zip archive using Windows PowerShell with the following command.

Compress-Archive -Path assignment09 -DestinationPath 'assignment09_DoeJane.zip

Discussion Board

You are required to have a minimum of 10 posts each week. Similar to previous courses, any topic counts towards your discussion count, as long as you are active more than 2 days per week with 10 posts, you will receive full credit. Refer to the optional topic below as a starting place.

Describe how different database systems handle transactions. Pick three or more different systems to compare and contrast.


Last update: March 12, 2023