HTML 5 support in PHP 8.4

Written on 2024-07-16

Even though HTML 5 has been around for over 16 years, PHP never had proper support for it. PHP does have \DOMDocument, which in theory should support HTML 4, but it isn't really HTML 4 compliant anymore.

So, yeah, classic PHP — right 😅? Well, we can laugh all we want, but let's take a moment to highlight new features that — albeit late — fix these quirks: PHP 8.4 is adding an HTML 5 compliant parser! In this post I'll go through the highlights of this new parser, and you can read with me through the whole RFC as well:

Backwards compatible

One of the core requirements for this new parser is that it should be fully backwards compatible. That's why internals have chosen to make a completely new class — within a new namespace — to house the new HTML 5 parser. The old \DOMDocument class is left (mostly) alone. The only change in the old implementation is that \DOMDocument now extends the abstract \Dom\Document class, which is also the parent for the new, HTML 5 compliant implementation: \Dom\HTMLDocument.

If you want to use PHP's new HTML 5 parser, that's the one you need:

// HTML 5 compliant
$dom = \Dom\HTMLDocument::createFromString($html); 

While the old version is still available as usual:

// HTML 4-ish support
$oldDom = new \DOMDocument(); 
$oldDom->loadHTML($html);

Constructing DOMs

One key difference you'll spot immediately is that the new implementation relies on static constructors instead of calling methods on the newly created object afterward. The new HTMLDocument class has three named constructors available:

HTMLDocument::createEmpty();
HTMLDocument::createFromFile($path);
HTMLDocument::createFromString($html);

These are their full signatures:

public static function createEmpty(string $encoding = "UTF-8"): HTMLDocument;
public static function createFromFile(string $path, int $options = 0, ?string $override_encoding = null): HTMLDocument;
public static function createFromString(string $source, int $options = 0, ?string $override_encoding = null): HTMLDocument;

For the $options variable, these options are available:

  • LIBXML_HTML_NOIMPLIED
  • LIBXML_COMPACT
  • LIBXML_NOERROR
  • \Dom\NO_DEFAULT_NS

The $override_encoding variable is used to override the implicit encoding detection routines as determined by the HTML parser spec. This can be useful when the document is downloaded manually.

DOM Objects

Note that using the new implementation will result in other types of value objects to be created as well. For example, instead of \DOMNode, you'll get \DOM\Node; instead of \DOMElement, you'll get \DOM\Element, etc. The RFC originally aimed to keep these objects the same between the old and new implementation, but there turned out to be too many differences. You can read all about them here.


Albeit a bit late, I think this is a very nice addition to PHP. I definitely have some usecases for it! What are your thoughts? You can leave them in the comments down below!

Things I wish I knew when I started programming

Things I wish I knew when I started programming cover image

This is my newest book aimed at programmers of any skill level. This book isn't about patterns, principles, or best practices; there's actually barely any code in it. It's about the many things I've learned along the way being a professional programmer, and about the many, many mistakes I made along that way as well. It's what I wish someone would have told me years ago, and I hope it might inspire you.

Read more

Comments

Loading…
No comments yet, be the first!
Noticed a tpyo? You can submit a PR to fix it.
HomeRSSNewsletterDiscord© 2025 stitcher.io