Monday, January 12, 2015

Book Review: Learning Concurrent Programming in Scala

The subtitle of Aleksandar Prokopec's Learning Concurrent Programming in Scala (Packt Publishing, November 2014) is, "Learn the art of building intricate, modern, scalable concurrent applications using Scala." Learning Concurrent Programming in Scala consists of 9 chapters and a little over 300 substantive pages.

I was impressed that Martin Odersky, the creator and lead designer of Scala, wrote a Foreword for Learning Concurrent Programming in Scala, but was even more impressed by what Ordersky wrote regarding the book:

The book could not have a more expert author. Aleksandar Prokopec contributed to some of the most popular Scala libraries for concurrent and parallel programming. He also invented some of the most intricate data structures and algorithms. With this book, he created a readable tutorial at the same time and an authoritative reference for the area that he had worked in. I believe that Learning Concurrent Programming in Scala will be a mandatory reading for everyone who writes concurrent and parallel programs in Scala.

Preface

I have found the Preface of Packt Publishing books to be a good source of information on what to expect in the book. The 11-page Preface of Learning Concurrent Programming in Scala is full of information about the book. After a few paragraphs on why knowledge of concurrency is important and how this book will help developers learn about concurrency with Scala, the Preface describes how the book is organized. Several paragraphs here describe how the book approaches the coverage it provides and states that the "goal of this book is not to give a comprehensive overview of every dark corner of the Scala concurrency APIs. Instead, this book will teach you the most important concepts of concurrent programming."

The "What this book covers" section of the Preface states that "the book covers the fundamental concurrent APIs that are a part of the Scala runtime, introduces more complex concurrency primitives, and gives an extensive overview of high-level concurrency abstractions." This section then provides brief descriptions of each of the book's nine chapters.

The "What you need for this book" section of the Preface states that the Java Development Kit (JDK) and Simple Build Tool (SBT) are needed for the examples. This section also states that no specific IDE or text editor is assumed. Another section of the Preface explains how to install JDK 7.

The "Installing and using SBT" section of the Preface describes the Simple Build Tool (SBT) as "a command-line build tool used for Scala projects" and explains how to download and install SBT. It then describes and demonstrates creating an SBT project and writing and running a HelloWorld.scala example.

The Preface of Learning Concurrent Programming in Scala contains step-by-step instructions on how to reference and reload external libraries in SBT. The section on using SBT also demonstrates how to ensure that "most of the examples [in the book run] in the same JVM instance as SBT itself." The section of the Preface titled "Using Eclipse, IntelliJ IDEA, or another IDE" briefly discusses the virtues of using a Java IDE but also adds a caveat about running the book's examples in an IDE: "editors such as Eclipse and IntelliJ IDEA run the program inside a separate JVM process."

The "Who this book is for" section of Packt Prefaces is often a good source of information on who the author had in mind when he or she wrote the book. This section of Learning Concurrent Programming in Scala states:

This book is primarily intended for developers who have learned how to write sequential Scala programs, and wish to learn how to write correct concurrent programs. The book assumes that you have a basic knowledge of the Scala programming language.

The Preface of Learning Concurrent Programming in Scala adds, "Even with an elementary knowledge of Scala, you should have no problem understanding various concurrency topics." It also states that "a basic understanding of object-oriented or functional programming should be a sufficient prerequisite" and that "this book is a good introduction to modern concurrent programming in the broader sense."

I spent a relatively large amount of time in this review on the longer-than-normal Preface because I believe it advertises well what potential readers would want to know about Learning Concurrent Programming in Scala.

Chapter 1: Introduction

The initial chapter of Learning Concurrent Programming in Scala "explains the basics of concurrent computing and presents some Scala preliminaries required for this book." The chapter begins with a nice introduction to concurrent computing, what it is, why it is desirable, and how it is different from distributed computing. The chapter looks at some of the issues facing low-level ("traditional") concurrency constructs before the section "Modern concurrency paradigms" blends an introduction to modern concurrency paradigms and their common characteristics with descriptions of which chapters in the book discuss each paradigm as implemented in Scala in more detail.

Chapter 1's section "The Advantages of Scala" explains three reasons that Scala's "support for concurrent programming is rich and powerful." The chapter provides a brief explanation of "how Scala programs are typically executed" before presenting "a Scala primer" in 4 1/2 pages.

Chapter 2: Concurrency on the JVM and the Java Memory Model

Chapter 2 of Learning Concurrent Programming in Scala covers the "lower-level primitives" upon which "most, if not all, higher-level Scala concurrency constructs are implemented." The chapter explains "the cornerstones of concurrency on the JVM" and discusses "how they interact with some Scala-specific features."

The second chapter introduces threads and processes, describes them, and explains how they are related. Another section of the chapter explains JVM threads and how they are related operating system threads, and how Scala's threading is JVM threading. The section provides an introduction to starting and terminating threads in Scala. It also explains why "most multithreaded programs are nondeterministic".

Learning Concurrent Programming in Scala's second chapter discusses atomic operations, race conditions, and use of the synchronized keyword. There is also good coverage of deadlock, what causes deadlock, and how to avoid deadlock. The chapter also covers other basic low-level Java/Scala concurrency concepts such as guarded blocks, interrupted threads, and graceful shutdown. Chapter 2's coverage of volatile introduces the concept, compares Java's and Scala's use of it, and compares use of volatile to synchronized.

Chapter 2 concludes with coverage of the Java Memory Model (JMM), immutable objects, and final fields. This coverage describes differences in Java's final and Scala's final and looks at some other Scala language design features related to concurrency. The point of this final portion of Chapter 2 is to establish that "the only way to correctly reason about the semantics of a multithreaded program is in terms of happens-before relationships defined by the JMM."

Chapter 3: Traditional Building Blocks of Concurrency

Learning Concurrent Programming in Scala's third chapter begins by explaining that the "concurrency primitives" covered in Chapter 2 are typically avoided because "their low-level nature makes them delicate and prone to errors" and undesirable effects such as "data races, reordering, visibility, deadlocks, and nondeterminism." This introduction explains that the third chapter demonstrates how to use "more advanced building blocks of concurrency" that "capture common patterns in concurrent programs and are a lot safer to use."

Chapter 3 introduces the Executor as an abstraction that "allows programmers to encapsulate the decision of how to run concurrently executable work tasks." It then specifically focuses on the ForkJoinPool implementation of Executor and ExecutorService. This section on declaring concurrent executions includes discussion of Scala's specific ExecutionContext.

The section of Chapter 3 on working with data in a concurrent environment begins with discussion of "atomic variables that provide basic support for executing multiple memory reads and writes at once." The chapter defines atomic variables as "close cousins" of volatile variables that "are more expressive than volatile variables" and "are used to build complex concurrent operations without relying on the synchronized statement." This section discusses compare-and-set (AKA compare-and-swap) and calls CAS "a fundamental building block for lock-free programming." The Scala-specific @tailrec annotation is also introduced here.

Chapter 3's section "Lock-free programming" discusses the potential advantages realized via lock-free programming, but also explains and demonstrates why it is not always easy to write lock-free code or even prove that code is lock free. For example, the section warns of conditions with implicit locks.

There is a section in Chapter 3 called "Implementing locks explicitly" that begins with the reminder that there are times when "we really do want locks" and points out that "atomic variables allow us to implement locks that do not have to block the caller." To illustrate these points, this portion of the chapter introduces the "concurrent filesystem API" example.

Chapter 3 features a section on the "ABA problem." The author acknowledges that there is "no general technique to avoid the ABA problem," but provides some "guidelines" for "avoiding the ABA problem in a managed runtime."

There is a section of Chapter 3 devoted to Scala's lazy values. The author explains that "Lazy values are extremely useful in practice, and you will often deal with them in Scala," but warns that "using them in concurrent programs can have some unexpected interactions." A couple of important observations are explained and highlighted here:

  1. "Cyclic dependencies between lazy values are unsupported in both sequential and concurrent Scala programs. The difference is that they potentially manifest themselves as deadlocks instead of stack overflows in concurrent programming."
  2. "Never call synchronized on publicly available objects; always use a dedicated, private dummy object for synchronization."

The "Concurrent collections" section of Chapter 3 demonstrates why "predicting how multiple threads affect the collection state in the absence of synchronization is neither recommended nor possible." This section examines a couple of approaches (immutable collections and use of synchronized) and their weaknesses before moving into discussion of concurrent collections. Regarding these concurrent collections, the author states, "Conceptually, the same operations can be achieved using atomic primitives, synchronized statements, and guarded blocks, but concurrent collections ensure far better performance and scalability."

Chapter 3 features subsections of the "Concurrent collections" section that focus on concurrent queues (BlockingQueue) and concurrent sets and maps (introduces asScala). The "Concurrent traversals" subsection introduces Scala's TrieMap for collection iteration in a concurrent environment.

Chapter 3 wraps up with coverage of "creating and handling processes" using the scala.sys.process package to work with processes as a concurrency alternative other than threads. This coverage includes introduction to the ! and !! methods for running a process that returns a return code or its standard output respectively.

Chapter 4: Asynchronous Programming with Futures and Promises

As Learning Concurrent Programming in Scala's Preface states, Chapter 4 "is the first chapter that deals with a Scala-specific concurrency framework," futures and promises. The chapter describes futures, describes when they are useful, distinguishes between future values and future computations, describes callbacks on futures versus functional composition on futures, and introduces flatMap as a basic example of a Scala combinator.

Chapter 4 introduces the Promise in relation to the Future: A promise and a future represent two aspects of a single-assignment variable: the promise allows you to assign a value to the future object, whereas the future allows you to read that value." The chapter also describes how to "use promises to bridge the gap between callback-based APIs and futures" and "use promises to extend futures with additional functional combinators."

Chapter 4 includes coverage of Scala Async, which is described as "a convenient library for futures and promises that allows expressing chains of asynchronous computations more conveniently." The author adds that Scala Async is "currently not a part of the Scala standard library." The chapter concludes with very brief coverage of some alternative frameworks implementing futures and promises in Scala.

Chapter 5: Data-Parallel Collections

The subject of Chapter 5 is data parallelism. The chapter provides an overview of the Scala Collections framework, differentiates between mutable and immutable collections, and describes using the par method to get parallel collections.

Chapter 5 also looks at characteristics of the JVM and of modern computer hardware that affect concurrency and performance. It discusses why these characteristics can make it difficult to accurately measure performance.

The "Caveats of parallel collections" section of Chapter 5 describes "non-parallelizable collections," "non-parallelizable operations," "Side effects in parallel operations," "nondeterministic parallel operations," and "Commutative and associative operators." There are also sections on "Using parallel and concurrent collections together" and "Implementing custom parallel collections."

The chapter ends with a section on "Alternative data-parallel frameworks" that discusses the issues associated with autoboxing when trying to use Scala collections with primitives. This section introduces Scala Macros and the ScalaBlitz Collections Framework. The author does provide a caveat: "ScalaBlitz was in the early stages of development at the time of writing this book, and macros are partly an experimental feature of Scala."

Chapter 6: Concurrent Programming with Reactive Extensions

The sixth chapter of Learning Concurrent Programming in Scala states that the "one disadvantage of futures is that they can only deal with a single result." It introduces event-driven programming, reactive programming and Reactive Extensions. The chapter covers Observables and Subscribers and some of the nuances of using Observables in a fair amount of detail. The Scheduler is also covered with extra focus on writing custom Schedulers. The chapter concludes with coverage of Subject, which it describes as "simultaneously an Observable object and an Observer object."

Chapter 7: Software Transactional Memory

Learning Concurrent Programming in Scala's seventh chapter states that "the disadvantage of using locks is that they can easily cause deadlocks" before introducing Software Transactional Memory (STM), which it describes as "a concurrency control mechanism for controlling access to shared memory, which greatly reduces the risk of deadlocks and races." The author explains that STM provides the best of atomic variables and synchronized code blocks. The particular STM implementation focused on in Chapter 7 is ScalaSTM and the chapter provides fairly detailed coverage of different issues to consider when working with transactional memory.

Chapter 8: Actors

The author opens Chapter 8 of Learning Concurrent Programming in Scala by explaining that the actor model applies both to applications using shared memory and to distributed applications whereas techniques covered in the last few prior chapters are limited to shared memory applications. The implementation of the actor model that is focused on in Chapter 8 is Akka's actor model. The chapter covers quite a few considerations when using Akka Actors and provides references to sources of additional information.

Chapter 9: Concurrency in Practice

The stated goal of the final chapter of Learning Concurrent Programming in Scala is "to introduce the big picture of concurrent programming." This includes a summary of the "plethora of different concurrency facilities" covered in the book. This summary presents tables that compare concurrency concepts covered in the book (categorized as "data abstractions" or "concurrency frameworks") in terms of data storage, data access, concurrent computations, and concurrent execution. The author's brief analysis of these tables leads to a bullet-formatted "summary of what different concurrency libraries are good for." This is probably my favorite section of the book and I like the highlighted point made in this section: "There is no one-size-fits-all technology. Use your own best judgment when deciding which concurrency framework to use for a specific programming task."

After Chapter 9's useful summary of the concurrency constructs and frameworks covered earlier in the book, the chapter moves onto a "remote file browser" sample application to demonstrate bringing the book's concepts together. I was happy to see the author explicitly point out that although this particular example brought all of the covered concepts into play intentionally, most realistic applications should not use all of them.

After presenting the summary of topics covered in previous chapters of the book and a demonstrative example of using those topics, Chapter 9 transitions to a section on "debugging concurrent programs." This section describes "some of the typical causes of errors in concurrent programs" and discusses "different methods of dealing with them." The specific areas of focus are deadlocks (including demonstration of VisualVM with color screen snapshots), incorrect output, and performance issues.

General Observations

  • Although Learning Concurrent Programming in Scala is best suited for developers comfortable with Scala in sequential development, it contains details that may appeal to Java developers and developers of other JVM-based languages. In particular, the first three chapters provide useful details that apply generally to JVM-based programming languages with only a few Scala-specific mentions. As evidenced by the length of my review of Chapter 3, I believe this is particularly true of Chapter 3.
  • Learning Concurrent Programming in Scala contains several graphics to illustrate points being made. The focus on these graphics is definitely more on content and substance than on presentation. The graphics tend to be simple drawings in black on white (or grayscale) even in the PDF version, but there are some color screen snapshots.
  • Learning Concurrent Programming in Scala's chapters each tend to end with references to a few other resources (typically other books or Scala's or the framework's online documentation) on the subject covered in the chapter. This is useful because although the book is fairly detailed in its coverage of each framework and approach, there is more information on each framework or approach available than can fit in a single chapter.
  • Each chapter of Learning Concurrent Programming in Scala includes "Exercises" for the reader to evaluate what they've learned from the chapter.
  • Code listings in Learning Concurrent Programming in Scala are black font on white background with no color syntax highlighting and no line numbers.
  • I have found Packt Publishing books to cover a wide spectrum in terms of language clarity and finishing from very well edited, polished books (such as Java EE 7 with GlassFish 4 Application Server) to some that seem like they've had barely, if any, editing. Learning Concurrent Programming in Scala is one of the more polished and better edited Packt Publishing books that I've read; although it has a couple awkward sentences and typos, they are few and far between.
  • I'll quote again from Scala creator and expert Martin Odersky's Foreword regarding Learning Concurrent Programming in Scala because he can obviously judge a book on Scala better than I and because he summarizes my less-informed opinion on this book, "With this book, [Aleksandar Prokopec] created a readable tutorial at the same time and an authoritative reference for the area that he had worked in. I believe that Learning Concurrent Programming in Scala will be a mandatory reading for everyone who writes concurrent and parallel programs in Scala."

Conclusion

Learning Concurrent Programming in Scala delivers on its advertisement in the Preface: "By reading this book, you will gain both a solid theoretical understanding of concurrent programming, and develop a set of useful practical skills that are required to write correct and efficient concurrent programs." The early chapters do provide the introductory material and background needed for a "solid theoretical understanding of concurrent programming" and the middle and later chapters introduce tips and suggestions that help readers to understand the considerations to be made when writing concurrent programs. Although Learning Concurrent Programming in Scala is obviously focused primarily on concurrent programming with the Scala language and Scala frameworks, some of the covered concepts and topics (particularly in the first part of the book) are relevant for Java and JVM developers.

No comments: