In this post, we’re going to give an overview of two of the ways to accomplish this with Node.js:
the worker_threads module
WHAT ARE THREADS AND PROCESSES?
First, let’s quickly define what threads and processes are, and why we might find ourselves needing to employ more than one of them.
In computer science, a process is essentially an actively-running instance of a computer program (for example, when you open Chrome or a text editor, a new process is created for each one). It contains the code needed to run the program and a reserved space in memory to handle the application’s data. Additionally, it has access to the computer’s processors in order to execute the code.
Every process has at least one thread, which is the entity that actually executes the code, line by line. If a process has multiple threads, they all have access to the processor’s memory.
While Node.js is great at handling a large amount of asynchronous i/o in a single thread with its use of an event loop, if we want to be able to perform computationally expensive operations (e.g. cryptography or image processing), this will block the main thread, because ultimately that single thread can only perform a single task at any given time. If we want to handle this kind of load, we’ll need a different solution.
WHAT IS THE CHILD_PROCESS MODULE?
Long ago, Node.js provided us with just that solution, out of the box! This is called the child_process module. When we want to perform computationally expensive operations, the child_process module lets us create an entirely separate process with its own memory, thread of execution, and ability to execute code in parallel with the code in the original file.
The child_process module has several different ways of creating new processes (such as
First, we’ll create two files: a parent.js script to orchestrate things, and a child.js script to handle some kind of time-consuming task.
In the parent.js script, we import the child_process module, and use its fork method to create a new process:
And in our child.js script, we do some time-consuming “work” (in this example, our script is just waiting 2000 milliseconds before logging to the console):
When we run the parent script, we are able to execute all of its code without having to wait for the child to be complete:
(Also note, each thread of execution has its own process ID)
GREAT! BUT WHAT IF WE NEED TO SHARE INFORMATION BETWEEN OUR TWO PROCESSES?
As mentioned, different processes are each assigned separate memory by the operating system, so our child process doesn’t have access to variables defined in the parent. This is where Inter Process Communication (or IPC) comes in.
IPC allows us to send data back and forth between our processes by setting up event listeners to listen for messages from the other process, and then emitting events as needed.
In each of our scripts, we can use the .send() method to send data, and the .on() method to set up event listeners (thereby giving us the ability to respond to any messages received).
Here’s a very simple example where the parent sends a name to the child, the child sends a greeting back to the parent, and then the parent console.logs out the greeting:
This may be a contrived example, but instead imagine that the parent script was a server accepting uploaded image files, and child script was using the image processing library to resize the images. By offloading this functioning to a separate process, our parent server is immediately able to continue responding to other requests without having to wait for the child to finish.
WHAT IS THE WORKER_THREADS MODULE?
While the child_process module is a good solution, it’s not necessarily the most efficient one. In 2018, Node.js introduced a more lightweight way of handling multiple threads, called the
worker_threads module. Unlike the child_process module (which creates an entirely new system process with its own separate memory), worker_threads is able to create additional threads within the same process. Not only is this method less resource-heavy, but it also makes it possible to share data between threads.
Note: since this is an experimental feature, if you want do want to use it, you have to use the ‘--experimental-worker’ flag when running node.
In this simple example below, a single file contains the code for both the main thread and the worker threads (but this isn’t necessary).
Importing ‘worker_threads’ gives us access to a number of objects and properties. In this example, we are using:
Worker: Instantiating this class allows us to create new threads
workerData: This allows us to access in the worker threads any value passed into the Worker constructor
isMainThread: This boolean property allows us to keep the main thread code separate from worker thread code
parentPort: Similar to IPC of child processes, this allows worker threads to communicate with the main thread, through the postMessage method
Here's how it works:
The main thread of this code creates two new worker threads and passes the strings ‘Thing A’ to the first, and ‘Thing B’ to the second. Inside the else statement, the worker threads will have access to the workerData passed in. They then send a new string back to the main thread through the postMessage method.
Then, the event listeners set up in the main thread receive these strings, and finally, log the message to the console! This is what executing this code would look like:
And that’s it!
Suffice it to say, this post barely scratches the surface of the functionality provided to you by the child_process and worker_threads modules. However, I hope it has sparked your curiosity and given insight into some motivations and solutions for multi-threading in Node.js.