Run at most N tasks at time with bluebird

comments

Bluebird.js is JavaScript library which implements Promises/A+ specification. In fact bluebird is more than this specification and implements additional helper functions. In this post I'll explore bluebird.map function and it's applications.

Table of contents

Task definition

Suppose that you have some array. Suppose that each entry of that array can be processed asynchronously and it doesn't depends on other entries.

Let's name asynchronous processing of such entry as task.

Suppose that array is large and you can't run all the tasks simultaneously. For example, if your tasks are CPU-bound you'd better to have at most N tasks running in parallel where N is a number of CPU cores.

Imagine you have many end-to-end tests based on mocha and phantom.js. You could run N processes of phantom.js in parallel to reduce time to run tests.

.NET Framework concepts

I'm mostly C# developer and I knows how this can be accomplished in .NET world.

In the .NET Framework there is Parallel LINQ. If you have arbitrary IEnumerable<T> type and wish to process it's data in parallel, you could execute AsParallel() extension method to turn on parallel processing:

var tasksData = new int[] { 1000, 3000, 4000, 1000 };
var swg = Stopwatch.StartNew();
var results = tasksData.AsParallel().Select((x, i) => {
    var sw = Stopwatch.StartNew();
    Console.WriteLine($"{i}: starts");
    // CPU-bound function here :)
    Thread.Sleep(x);
    Console.WriteLine($"{i}: end {sw.ElapsedMilliseconds}");
    return x;
}).ToList();
Console.WriteLine(swg.ElapsedMilliseconds);

This code will execute CPU-bound function passed to Select method for each element of tasksData. Functions will be called in different threads from the pool, at most N at time. .NET Framework has empiric algorithms to limit simultaneous work to CPU core count.

Code will print to the console (the order is not guaranteed, my notebook has 2 core):

2: starts
0: starts
0: end 1002
1: starts
2: end 4017
1: end 3015
3: starts
3: end 1018
5044

I'd like to save this semantic and to find replacement in node.js world.

The power of Bluebird.js

Bluebird.js is fast Promises/A+ implementation with additional methods.

I found one of such extensions - Bluebird.map function. - in first argument you can specify array of data required for task execution - in second argument you can specify mapper-function which can actually run your task and return the promise - in third argument you can specify "most N tasks at the time" with { concurrency: N } object.

It doesn't guarantee the order of execution of mapper-function on the array elements, but the order of fulfilled results array will have same order eventually.

Specify concurrency level option in the third argument is a killer-feature that solves the task! Using concurrency level the number of cores in the system, you'll execute N asynchronous tasks at the same-time and when one of them will be finished, next asynchronous task will be added to the pool. Yes, node.js is single-threaded, but you can run N processes from node.js with full control on degree of parallelism.

Let's explore small example:

import bluebird from 'bluebird'
const delays = [1000, 3000, 4000, 1000]
function mapperFunction(delay, index) {
  console.time(index)
  console.log(index + ':', 'starts')
  return new bluebird(resolve => {
    setTimeout(
      () => {
        console.timeEnd(index)
        resolve()
      }, delay)
    })
}
console.time()
bluebird.all(bluebird.map(delays, mapperFunction, { concurrency: 2 }))
  .then(() => console.timeEnd())

Here I've four tasks with different durations. I've specified concurrency level = 2, therefore only two tasks can be executed at the same time. If the task were taken in order, then the example should be completed in 5 seconds. But in fact the order doesn't guaranteed (like .NET's PLINQ) and on my notebook it always tooks 6 seconds to complete.

Here is console output on my system:

0: starts
1: starts
0: 1004.890ms
3: starts
3: 1001.304ms
2: starts
1: 3000.930ms
2: 4002.682ms
default: 6021.021ms

And here is visualized timeline for events execution:

PlantUML SVG diagram

Other solutions

When I started search for solution, I found useful article with examples for Q library. The author, Gleb Bahmutov, also showed hand-written solution for the same task. Worth to try!

Conclusion

Bluebird.js provides power map function which you can use to schedule "at most N tasks at time". The order of tasks execution is not guaranteed, but in fact this is not always a disadvantage. I've reduced time to run all tests based on phantom.js by 3x factor just by using this function with concurrency option set to CPU cores count!

Happy coding!

Comments