PHP Generics and why we need them

In today's blog post we'll explore some common problems with arrays in PHP. All the problems and issues listed could be solved with a pending RFC which adds generics to PHP. We won't explore in too much detail what generics are, but at the end of this read, you should have a good idea as to why they are useful, and why we really want them in PHP. So without further ado, let's dive into the subject.

Imagine you have a collection of blog posts, loaded from a data source.

$posts = $blogModel->find();

Now you want to loop over every post, and do something with its data; let's say, the id.

foreach ($posts as $post) {
    $id = $post->getId();
    
    // Do something
}

This is a scenario that happens often. And it's this scenario we'll explore to discuss why generics are awesome, and why the PHP community desperately needs them.

Let's take a look at the problems of the above approach.

# Data integrity

In PHP, an array is a collection ofโ€ฆ things.

$posts = [
    'foo',
    null,
    self::BAR,
    new Post('Lorem'),
];

Looping over this array of posts would result in a fatal error.

PHP Fatal error:  Uncaught Error: 
Call to a member function getId() on string

We're calling ->getId() on the string 'foo'. Not done. When looping over an array, we want to be sure that every value is of a certain type. We could do something like this.

foreach ($posts as $post) {
    if (! $post instanceof Post) {
        continue;
    }

    $id = $post->getId();
    
    // Do something
}

This would work, but if you've written some production PHP code, you know these checks can grow quickly, and pollute the codebase. In our example, we could verify the type of each entry in the ->find() method on $blogModel. However, that's just moving the problem from one place to another. It's a bit better though.

There's another problem with data integrity. Say you have a method which requires an array of Posts.

function handlePosts(array $posts) {
    foreach ($posts as $post) {
        // ...
    }
}

Again, we could add extra checks in this loop, but we could not guarantee that $posts only holds a collection of Post objects.

As of PHP 7.0, you could use the ... operator to work around this issue.

function handlePosts(Post ...$posts) {
    foreach ($posts as $post) {
        // ...
    }
}

But the downside of this approach: you would have to call the function with an unpacked array.

handlePosts(...$posts);

Noticed a tpyo? You can submit a PR to fix it. If you want to stay up to date about what's happening on this blog, you can subscribe to my mailing list: send an email to brendt@stitcher.io, and I'll add you to the list.

# Performance

You can imagine it's better to know beforehand whether an array contains only elements of a certain type, rather than manually checking the types within a loop, every, single, time.

We can't do benchmarks on generics, because they don't exist yet, so it's only guessing as to how they would impact performance. It's not far-fetched to assume though, that PHP's optimised behaviour, written in C; is a better way to solve the problem than to write lots of userland code.

# Code completion

I don't know about you, but I use an IDE when writing PHP code. Code completion increases productivity immensely, so I'd also like to use it here. When looping over posts, we want our IDE to know each $post is an instance of Post. Let's take a look at the plain PHP implementation.

# BlogModel

public function find(): array {
    // return ...
}

As of PHP 7.0, return types were added, and in PHP 7.1 they were refined with nullables and void. But there's no way our IDE can know what's inside the array. So we're falling back to PHPDoc.

/**
 * @return Post[]
 */
public function find(): array {
    // return ...
}

When using a "generic" implementation of e.g. a model class, type hinting the ->find() method might not be possible. So we're stuck with type hinting the $posts variable, in our code.

/** @var Post[] $posts */
$posts = $blogModel->find();

Both the uncertainty of what's exactly in an array, the performance and maintenance impact because of scattered code, and the inconvenience when writing those extra checks, makes me long for a better solution.

That solution, in my opinion is generics. I won't explain in detail what generics do, you can read the RFC to know that. But I will give you an example of how generics could solve these issues, guaranteeing the developer would always have the correct data in a collection.

Big note: generics do not exist in PHP, yet. The RFC targeted PHP 7.1, and has no further information about the future. The following code is based on the the Iterator interface and the ArrayAccess interface, which both exist as of PHP 5.0. At the end, we'll dive into a generics example, which is dummy code.

First we'll create a Collection class which works in PHP 5.0+. This class implements Iterator to be able to loop over its items, and ArrayAccess to be able to use array-like syntax to add and access items in the collection.

class Collection implements Iterator, ArrayAccess
{
    private int $position;

    private array $array = [];

    public function __construct() {
        $this->position = 0;
    }

    public function current(): mixed {
        return $this->array[$this->position];
    }

    public function next(): void {
        ++$this->position;
    }

    public function key(): int {
        return $this->position;
    }

    public function valid(): bool {
        return array_key_exists($this->position, $this->array);
    }

    public function rewind(): void {
        $this->position = 0;
    }

    public function offsetExists($offset): bool {
        return array_key_exists($offset, $this->array);
    }

    public function offsetGet($offset): mixed {
        return $this->array[$offset] ?? null;
    }

    public function offsetSet($offset, $value): void {
        if (is_null($offset)) {
            $this->array[] = $value;
        } else {
            $this->array[$offset] = $value;
        }
    }

    public function offsetUnset($offset): void {
        unset($this->array[$offset]);
    }
}

Now we can use the class like this.

$collection = new Collection();
$collection[] = new Post(1);

foreach ($collection as $item) {
    echo "{$item->getId()}\n";
}

Note that with this simple implementation, there's no guarantee that $collection only holds Post object. For example, adding a string would work fine, but would break our loop.

$collection[] = 'abc';

foreach ($collection as $item) {
    // This fails
    echo "{$item->getId()}\n";
}

With PHP as it is now, we could fix this problem by creating a PostCollection class.

class PostCollection extends Collection
{
    public function current() : ?Post {
        return parent::current();
    }

    public function offsetGet($offset) : ?Post {
        return parent::offsetGet($offset);
    }

    public function offsetSet($offset, $value) {
        if (! $value instanceof Post) {
            throw new InvalidArgumentException("value must be instance of Post.");
        }

        parent::offsetSet($offset, $value);
    }
}

Now only Post objects can be added to our collection.

$collection = new PostCollection();
$collection[] = new Post(1);

$collection[] = 'abc';

foreach ($collection as $item) {
    echo "{$item->getId()}\n";
}

It works! Even without generics! There's only one issue, you might be able to guess it. This is not scalable. You need a separate implementation for every type of collection, even though the only difference between those classes would be the type. Also note that IDEs and static analysers will be able to correctly determine the type, based on the return type of offsetGet in PostCollection.

You could probably make the subclasses even more convenient to create, by "abusing" late static binding and PHP's reflection API. But you'd still need to create a class, for every type available.

# Glorious generics

With all that in mind, let's just take a look at the code we would be able to write if generics were implemented in PHP. This would be one class which could be used for every type. For your convenience, I'll only be writing the changes compared to the previous Collection class, so keep that in mind.

class GenericCollection<T> implements Iterator, ArrayAccess
{
    public function current() : ?T {
        return $this->array[$this->position];
    }

    public function offsetGet($offset) : ?T {
        return $this->array[$offset] ?? null;
    }

    public function offsetSet($offset, $value) {
        if (! $value instanceof T) {
            throw new InvalidArgumentException("value must be instance of {T}.");
        }

        if (is_null($offset)) {
            $this->array[] = $value;
        } else {
            $this->array[$offset] = $value;
        }
    }

    // public function __construct() ...
    // public function next() ...
    // public function key() ...
    // public function valid() ...
    // public function rewind() ...
    // public function offsetExists($offset) ...
}
$collection = new GenericCollection<Post>();
$collection[] = new Post(1);

// This would throw the InvalidArgumentException.
$collection[] = 'abc';

foreach ($collection as $item) {
    echo "{$item->getId()}\n";
}

And that's it! We're using T as a dynamic type, which can be checked before runtime. And again, the GenericCollection class would be usable for every type, always.