Running PHP code in parallel, the easy way
Less is more. You've heard that before, right? Keep it in mind, I'm going to show you something.
There are a few good and robust solutions to run PHP code in parallel already; and yet, we've made our own implementation. I want to explain why. First, let's set the scene: I want to run PHP code in parallel. Here are some of my use cases:
- to test race conditions for our command-bus, when running PHPUnit tests;
- to do a bunch of HTTP requests in parallel; and also
- to generate this blog faster by allowing my static generator to work on multiple processes.
My use cases have two requirements in common: run a arbitrary amount of functions in parallel, and wait until all of them are finished. Let's look at the solutions available today.
AmpPHP has a package called parallel-functions. It looks like this:
use Amp\Promise; use function Amp\ParallelFunctions\parallelMap; $values = Promise\wait( parallelMap([1, 2, 3], function ($time) { \sleep($time); return $time * $time; }) );
For my use cases, I've got a few problems with this implementation:
- it uses promises, which are very good for more complex async work, but are only overhead for me;
- Amp's API and its use of functions feels very clunky to me, but that's rather subjective, I realise that; and finally
- if you need a framework in your child processes, you'll need to boot it manually.
Moving on to ReactPHP, they don't have an out-of-the-box solution like Amp, but they do offer the low-level components:
$loop = React\EventLoop\Factory::create(); $process = new React\ChildProcess\Process('php child-process.php'); $process->start($loop); $process->stdout->on('data', function ($chunk) { echo $chunk; }); $process->on('exit', function($exitCode, $termSignal) { echo 'Process exited with code ' . $exitCode . PHP_EOL; }); $loop->run();
A few caveats with this implementation:
- ReactPHP always requires you to manually create an event loop which, again, is overhead for me;
- they also work with promises; and finally
- they only offer the bare infrastructure to run processes in parallel, there's lots of manual setup work.
Finally, there's Guzzle with its concurrent requests:
use GuzzleHttp\Client; use GuzzleHttp\Promise; $client = new Client(['base_uri' => 'http://httpbin.org/']); $promises = [ 'image' => $client->getAsync('/image'), 'png' => $client->getAsync('/image/png'), 'jpeg' => $client->getAsync('/image/jpeg'), 'webp' => $client->getAsync('/image/webp') ]; $responses = Promise\Utils::unwrap($promises);
- Again, there's the overhead of promises; but more importantly
- Guzzle only works with HTTP requests, which only solves part of my problem.
Of all of the above, Amp's approach would have my preference, were it not that it still has quite a lot of overhead for my simple use cases. Honestly, all I wanted to do was to run some functions in parallel and wait until all of them are finished. I don't want to be bothered by looking up documentation about the particular API a framework is using. Did I have to import a function here? How to unwrap promises? How to wait for everything to finish?
All of the above examples are great solutions for the 10% cases that require people to have lots of control, but what about the 90% of cases where you just want to do one thing as simply as possible?
Less is more. We often forget that in software design. We overcomplicate our solution "just in case" someone might need it, and forget about the 90% use case. It leads to frustration because developers have to look up documentation in order to understand how to use a framework, or they have to write lots of boilerplate to get their generic case to work.
So with all of that being said, you now know why I decided to make another library that has one simple goal: run functions in parallel and wait for the result. Here's what it looks like:
$rssFeeds = Fork::new() ->run( fn () => file_get_contents('https://stitcher.io/rss'), fn () => file_get_contents('https://freek.dev/rss'), fn () => file_get_contents('https://spatie.be/rss'), );
And that's it. It does one job, and does it well. And don't be mistaken: it's not because there's a simple API that it only offers simple functionality! Let me share a few more examples.
Parallel functions are able to return anything, including objects:
$dates = Fork::new() ->run( fn () => new DateTime('2021-01-01'), fn () => new DateTime('2021-01-02'), );
They use process forks instead of fresh processes, meaning you don't need to manually boot your framework in every child process:
[$users, $posts, $news] = Fork::new() ->run( fn () => User::all(), fn () => Post::all(), fn () => News::all(), );
They allow before and after bindings, just in case you need to do a little more setup work. In the previous example, Laravel actually needs to reconnect to the database in the child processes before it would work:
[$users, $posts, $news] = Fork::new() ->before(fn () => DB::connection('mysql')->reconnect()) ->run( fn () => User::all(), fn () => Post::all(), fn () => News::all(), );
And finally, before and after bindings can be run both in the child process and parent process; and also notice how individual function output can be passed as a parameter to these after
callbacks:
Fork::new() ->after( child: fn () => DB::connection('mysql')->close(), parent: fn (int $amountOfPages) => $this->progressBar->advance($amountOfPages), ) ->run( fn () => Pages::generate('1-20'), fn () => Pages::generate('21-40'), fn () => Pages::generate('41-60'), );
There are of course a few things this package doesn't do:
- there's no pool managing the amount of concurrent processes, you're in charge if you need to;
- there are no promises;
- pcntl doesn't work on Windows and doesn't run in web requests;
- there's no behind the scenes exception handling, if a child fails it'll throw an exception and stop the process flow.
In other words: it's the perfect solution for the 90% case where you just want to run some functions in parallel and be done with it. If you need anything more than that, then the solutions listed above are a great start. There's also another package of ours called spatie/async
that doesn't work with promises but does offer pool configuration and extensive exception handling.
If you want to know more or want to try the package yourself, you can check it out on GitHub: spatie/fork
.
Less is more. That's one of my core principles when coding. I prefer code that forces me to do something one way but always works, instead of a highly configurable framework that makes me wonder how to use it every time I look at it. I feel that many developers often get lost in a maze of high configurability and extensibility and forget their original end goal by doing so.
I hope this package can be of help for that group of people who fall in the 90% category.