HTML5 video

Amos Wenger: My mouth is so dry.

James Munns: How dry is it?

Amos Wenger: So dry. I don't know where that comes from. I've heard this reference before, but I don't even know what it...

James Munns: I don't know where it's from. That's just one of those, it's been... It's like a running meme

Amos Wenger: "It's been..."

James Munns: for commercials for 50 years.

Amanda Majorowicz: "... one week since you looked at me"

Amos Wenger: Yes!

Amanda Majorowicz: I just want to have all these references in that way.

Amos Wenger: Yes. Cool. Well, share screen, that screen. Yes. Yes. Good. Wonderful. Yay.

Amanda Majorowicz: Okay, bye.

Amos Wenger: Bye.

James Munns: OK bye!

Amos Wenger: I have a cat on me, so if I start yelping, it means that Sherlock has planted firmly his... nails?

James Munns: Claws, nails.

Amos Wenger: Claws. Today, we're going to talk about HTML5 video. And then there's a little reference. James, do you know what the subtitle is a reference to?

James Munns: Hold on, I need to get like a... I don't have a cup, so I can do the Bane voice. "Chaos is a ladder."

Amos Wenger: Yes. But I said "Chaos is a bitrate ladder." And a bitrate ladder is a set of settings that you... Like for the... You know how YouTube has a little selector, you can go 4K, 1080p, 720p. They also choose the bitrate target for the video, for the audio, which codecs to use, blah blah blah blah. So it's a little joke for the two people from Vimeo who are going to listen to this. Hi friends. I miss you.

James Munns: Probably know them by name.

Amos Wenger: I do. I'm not going to say their name.

James Munns: Wait, did you work at Vimeo?

Amos Wenger: I did not, but I had at least one friend there.

James Munns: Gotcha.

Amos Wenger: Because in this ties into slide number two of 37,

James Munns: which you can get by going to sdr-podcast.com/episodes,

Amos Wenger: or on YouTube, if you're watching the podcast on YouTube. The current slide is showing a YouTube video, how meta, called "This is a Video About Video" that I released in September 2022. So I have been at this for quite some time.

James Munns: I remember this episode. This one was super good.

Amos Wenger: People didn't care for it. It only has 13,000 views, as you can tell.

James Munns: Is this the one that you had all the different bitrates and explained how all the chunking and everything works?

Amos Wenger: Yeah.

James Munns: I really like this one.

Amos Wenger: Yeah. So this is exhibit A in Amos cannot scope content to save their life because the end of the video is just me speeding up more and more, me talking through the rest of the script, because it was already 40 minutes. I just realized halfway through the script, nobody cares. But then of course, I got 10 different comments saying, "Oh, it was interesting what you were saying in the outro. You should do a video about that."

James Munns: To the 10 of you that made it to the end, shout out, you're a real one.

Amos Wenger: The initial motivation was I'm making videos on YouTube, and that's cool, and that's great. And I like to earn money with ads and sponsored videos and whatnot. But people who already support me through Patreon or GitHub Sponsors, it feels bad that they have to, in a way, pay again by sitting through these ads and sponsored segments. Well, most of them don't because there's software solutions for that, but I'm not going to name them because I make money off of this. It just felt bad. So I wanted to provide my paying customers, my patrons, the people who make all this possible, with a way to watch my videos directly from my website where they wouldn't get any ads. It's not even about the ads. It's about being around the whole Google surveillance ecosystem. They just don't want to touch it at all.

James Munns: So you did what comes naturally and set up an entire video streaming and CDN platform from scratch.

Amos Wenger: I did.

James Munns: As you do.

Amos Wenger: As you do. And I tore it down because it worked, but V1 was pretty much a Rust program calling the ffmpeg CLI. And then version two was let's link directly against ffmpeg as a library and process every frame ourselves. And this is what the video was about. I was in the middle of doing all that. And it never went anywhere. I worked on it for months. I got some extra help on it from different people. I got my life together in 2024. I was like, "I'm going to archive this project and I'm going to start again from scratch at some point." But I just removed all traces of it from the code base on my website. And that allowed me to focus on the actual... the text, the images, everything else.

So for example, I have a great image pipeline. I've already talked about diagrams. So I'm not going to talk about them here, but let's just talk about bitmaps. I take a lot of screenshots of my screen with CleanShot, which is a great product. This episode is not sponsored by CleanShot, but it should be. You can just essentially pick a little rectangle. It goes to the slider screen and then you can drag that anywhere. And I write my articles as markdown. There are text editors. I know there's visual code extensions where you can drag into the editor and then it just copies the file next to your markdown file. I don't know. I'm using zed. I don't think their extension API lets you do that quite yet. And also, I don't want multi megabyte PNGs because my articles are stored as markdown in a Git repository and it started getting really big over time. So what I do is that I have them drag and drop. I'm going to show a picture of that in a later slide. I drag them onto the browser that I'm currently previewing the article in. I drag them on top of a paragraph.

If I drag on the left side, it replaces the paragraph with the image. If I drag on the right side of the paragraph, it appends after the paragraph. And the way it works is that each paragraph has a byte offset. When rendered to HTML, there's an attribute that says the byte offset in the source. And that's how it knows how to edit the markdown to put the image in the right place. So I'm very happy about that. This is not even the topic of today's presentation, but I'm really happy. And when you do that, there's a little screen that shows how big the image is going to be and encourages you to write alt text.

James Munns: I would love to have a whole episode on that because I don't know if I've ever seen someone use their website to build their website like you do in that you have like all this functionality that's built into the website itself running in like developer mode serving locally and you use essentially your own website or this is what it seems like from the outside is that you use your website to build your website, which is a really cool workflow that I can't even imagine. So I would love to have like the guided tour of what the inside of that process looks like.

Amos Wenger: I'm thinking of how to make a video on it because it's there's a big visual components of this and also it keeps changing. Like yes. And also I wish less of my content was about how my website works. How you make content. Because yeah, half the YouTube channels out there are like: how to get a good camera, how to get good at editing. And I didn't want to do that with mine. So instead of a YouTube channel about how to make a website, I don't know.

James Munns: But that's what this entire podcast is. The bizarre things that we've built for ourselves for ourselves. Now that we have the things we've learned.

Amos Wenger: Yeah, exactly. Every time I obsess over a detail, I don't have to make an entire video about it. I can just show you.

James Munns: That's what the support group here is for.

I have a great image pipeline

Amos Wenger: Exactly. It's a... "Tinkerers Anonymous?" Such a better name for the podcast. So I have a great image pipeline. I'm really fond of it. So I drag onto the browser. It modifies the markdown. It transparently converts to JPEG XL with a high enough quality that it's visually lossless to me for screenshots. But it's small enough that I don't mind putting it in a Git repository. I don't have to worry about large file storage, Git LFS or anything like that. And then in the browser, it serves either as that JPEG XL file, the .jxl file or AVIF or WebP, depending on your browser. We're going to get into which browser supports what.

What you're seeing on the slide right now, if you're watching the slide, is a picture tag. If you didn't know, there's not just image tags, a . There's also picture tags. And the picture tags are nice because they take sources. So there are three different sources here in order of my preference. JPEG XL goes first, then AVIF, then WebP. Other features of the markup that I'm showing on screen is that I set the loading attribute to lazy for the image, which means the image doesn't load until it's within or near the viewport. So I have very long articles that span many, many, many pages. So this will only load the first two, I guess. And it also specifies width and height. That's a recent addition. I've been improving my asset pipeline. And it just grabs the image dimensions when indexing the revision.

James Munns: Does this mean you can figure out how far people make it in your articles by the percentage of...?

Amos Wenger: I was thinking of that while making the slides. Yes.

James Munns: Yeah.

Amos Wenger: I could. But I don't track that.

James Munns: It's like I don't explicitly take metrics, but I do have the number of downloads for each of my images. And gee, the ones at the top sure get downloaded a lot more than the ones at the bottom.

Amos Wenger: The bounce rates for my website is terrible. It's a nerd snipe. I don't know. It doesn't hit often. When it does, people read the entire thing. But when it, yeah, most people are like, oh, God, and recoil in horror. There's also alt text. I've started writing alt text for images. I don't like writing it in markdown in my text editor. But because of this nice little drag and drop thingy, it encourages me to give the image a descriptive file name and then also write alt text and it has a minimum length.

So I need to write at least. I tend to dictate it, actually. I use Superwhisper to just dictate the description of the image. I don't know when I started doing that. I didn't go back and add alt text to all the other images. The more backlog I have, not backlog, the more back catalog I have, the more work it is to go back and just bring everything up to code, essentially. But the width height stuff was also for me because when the page updates, well, it's only because of videos. Basically, the idea is to avoid layout shifts. You want the browser to know how big the end is going to be before it actually gets to download the image.

Options: JXL, AVIF, WebP

Amos Wenger: JPEG XL is so good. Why does Google hate it? Everyone was sad because JPEG XL support didn't make it into Chrome is essentially what happened. But it shouldn't be named JPEG XL because I think it's bad- bad publicity. It is amazing. It's the most modern royalty free open standard for images. It can do everything. It has a lossless mode. It has a lossy mode. It is extremely good. We're going to be looking at some numbers and you will- just... I don't know if you- have you ever used it, James?

James Munns: No. Is this one of those things like JavaScript has nothing to do with Java? It's just popular at the time. So it got the same name or?

Amos Wenger: It's the same design committee. But... people don't think of JPEG as a design committee. They think of it as like those artifacts that you grew up on. So even JPEG 2000 was much better than JPEG. But I don't know. Oh, and it's backwards compatible with classic JPEG. But I don't know exactly how that works. I think you can include a version that is decodable by others? I don't know exactly how it works. Or you can. Oh, you can. I think you can convert JPEG to JXL without a loss of quality. So you can convert your entire library. I think that's the feature. The only issue we're looking at a caniuse screenshot here. James, tell me what you see. So... just Apple? Yeah. Just Safari. Just Safari.

James Munns: Just Safari.

Amos Wenger: Starting from version.... And all of them have a little footnote. Number four. I don't even know what's in there. But from Safari 17, which I think was 2023 something. I don't know. Yeah. Essentially, the only person that sees those JXL images on my website is me. The single Safari user.

James Munns: I would say iPhone users. And you know, true, actually. Yeah. Every engine on on iPhone. Well, did they change that or is it still everything is Safari on iPhones? Or did they relax that finally?

Amos Wenger: Something something European laws. I don't know. Let's make an episode about that.

Amanda Majorowicz: I know on the work iPhone, I just use Chrome. I don't know.

James Munns: Yeah, you use Chrome, But they had a thing where-

Amos Wenger: That's still WebKit. Yeah.

Amanda Majorowicz: That's different?

James Munns: It's just a Chrome skin on Safari's browser.

Amos Wenger: Yes.

Amanda Majorowicz: On top of Safari? What lies! OK, bye.

Amos Wenger: So JPEG is excellent. It is the best. I love it so much. And it's also like a supported natively throughout MacOS. So you can open JPEG XL images and preview and everything. You get the little thumbnails in Finder. It's just like any other image format. AVIF is my second favorite choice. It's essentially AV1 in... as a-

James Munns: Amos, what's AV1?

Amos Wenger: What's AV1? We're going to get to that later because we're starting with images. But yeah, AV1 is the current best royalty free format that has gotten some adoption.

James Munns: For videos.

Amos Wenger: For videos. So the insight here is that most of the modern image formats are video codecs where you just encode only one frame or more. If you're trying to replace GIF, you can just encode an animation because GIF can do animations because you can tell it to do more stuff later. Like the only reason that GIF is sort of efficient in animation is because it has what we call in GUI screen damage rendering. Essentially, you don't- you don't encode the entire size of the image for every frame. You're just like, OK, this rectangle here you draw, you add a hat or whatever. And then you have...

James Munns: Similar to Delta encoding and stuff like that, where you're only describing the changes, not the entire... entire- every single frame.

Amos Wenger: Yep. Whereas video codecs have much better, much more modern ways of like motion vector detection. And that's just the very basics. I don't- I don't know what modern codecs are. I'm not there yet in my research. So AVIF is great, and the support is actually pretty decent. Again, we're looking at a caniuse.com screenshot here and the browsers that do not support it are: Internet Explorer 11, which has been killed by Microsoft officially; Opera Mini, QQ Browser and KaiOS browser.

James Munns: So basically some archaic embedded browsers and that's because Opera Mini is not even the same as Opera or Opera Mobile.

Amos Wenger: It's not the same as Opera Mobile. Also Opera is dead and buried.

James Munns: Opera Mini is like the feature phone browser.

Amos Wenger: Because Opera GX is just the malware distributor.

James Munns: Chrome, right?

Amos Wenger: Now.

James Munns: Oh, OK.

Amos Wenger: Yeah. Opera has been bought. They're not even the same people. I miss the days where Opera was an actual browser engine. It was competing with Internet Explorer and Firefox and whatnot. But those days are long gone. The Opera brand is dead. Everything's dead. The Opera GX is just... I don't know. It's a scam. They have not sponsored any of my videos.

James Munns: So I was going to say, so we're going to get that Opera Browser sponsorship because they did sponsor a ton of YouTube videos and podcasts and stuff. So I guess we're... I guess we're not getting that sweet Opera Browser sponsorship.

Amos Wenger: No, we're not. I mean, I can make a podcast that is about all the popular sponsors for YouTubers and why most of them are bullshit. The other big one was Honey. A lot of it came... a lot of information came out about that browser extension.

James Munns: Yeah. When I see someone sponsor like one or two podcasts, especially like in a specific niche, I'm like, OK, that makes sense. When every single podcast and YouTube video I watch has the same sponsor, I'm like, they have too much marketing budget. And I don't know- that makes me... that's like negative incentive to me.

Amos Wenger: True. So we've seen JPEG XL, we've seen AVIF And then because AVIF is not supported everywhere yet, I also include WebP, which is VP8 in a trench coat. So again, it's based on a video codec, a much older one, actually. WebP was announced September 2010 and the first stable version of its supporting library, which was released in April 2018. So that took a little while. But basically, Google bought onto technologies and they had codecs and they used it and then they extended the format over the next bunch of years. And AVIF is designed to be a successor to WebP.

So WebP has almost universal adoption. This looks like the same screenshot, but it's not. WebP has been supported by a lot of browsers for a very long time. So if you look at the AVIF one, it's only supported since Safari 16.4. Why is WebP since 14 on iOS. So, yeah, for older iPhones.

James Munns: Hey, plus we get the Baidu - excuse me, the QQ Browser back on this one.

Amos Wenger: We do. And the KaiOS Browser and Opera Mini. But I really don't like WebP. We're going to get into that.

PSA: modern image formats are good

Amos Wenger: A little PSA in the middle of this episode. Modern image formats are good. You're wrong. Most people are wrong. I see a lot of people complaining about: oh, I downloaded an image from a website and it's WebP. I just wish they served JPEG or PNG. Get with the times. OK, they saved so much bandwidth with that. They're saving the planet. We cannot afford to serve everything as PNG anymore. We have so so many like much better formats now. I'm also annoyed because I have to convert everything to PNG to drag into DaVinci Resolve.

James Munns: That was my question is what makes them good? Is it just they're more efficient for a given quality or?

Amos Wenger: Yes. There's also extra features, but mostly mostly is that. Yeah, we're going to look at numbers in a second. This is where I chose to somehow put the slide for the my image upload workflow for some reason, which yeah, I drag onto my website in the browser window. Oh, yeah, I wanted to mention that... something fun. If you drag something on a browser window on Mac, it does not focus the window. It does not raise the window. So that's really annoying because I wanted to autofocus the image name input field. And yeah, it is, but the Safari is not active, so it doesn't work. So I don't know if you remember a topic I've covered recently, James.

James Munns: Is this the automator? Is this the reason that you got the automator stuff working so you can like bring the browser to foreground?

Amos Wenger: It's not the reason, but yeah, then I just sold it like that. I was like, OK, now I'm just calling a script to tell Safari to activate itself. And so whenever an image drops... there's a bunch of things like that that I can do because I'm running a server locally. So anything that is not allowed on a web page for security reasons, like bringing the window to the front, I can do from native code as a Rust binary signed locally. So yeah, it's a really fun, fun tidbit.

James Munns: I'm sure you could pop all the sandboxing and the anti like pop under and pop over and all the stuff is disabled for external content because it's been obnoxious for 20 years. But it's very useful in these kind of cases.

Amos Wenger: Well, no, yeah, I don't. I don't think Safari allows you to disable any of the security around running code from local host. That would be pretty bad. They already don't... like they considered it mixed content in an insecure origin. I don't know. There's a... if you use the samply profiler, they say use Firefox or Chrome to show... because it's opening profiler.firefox.com to visualize the traces and the traces are served from local host. And Safari prevents that because like no, an HTTPS page cannot load content from local host over HTTP. That doesn't work.

James Munns: You get like CORS rules violations and stuff like that?

Amos Wenger: I'm not sure it's CORS even. But yeah, something like that. There's an error. That says just use Firefox. It's the only reason I use Firefox. And then I complain about it because they call themselves an indie browser in 2025, which makes me laugh. It was true once, but then it wasn't. So about images, I didn't just have to pick formats. I had to, like I said, create a bitrate ladder or like choose a bunch of presets to decide how big am I comfortable with the images being and whatnot.

And then when you do that, you want to have some number... You want to have the computer compare the quality of the two images. You don't want to eyeball it because you get tired. It's a long day, blah, blah, blah. It's complicated because computer scores can sometimes be high and humans can say that it looks bad. But ssim, which is structural similarity index is not the worst. It's perception- it's a perception based model. So it's better than PSNR 'peak signal-to-noise ratio' or MSE.

James Munns: You got to describe this because it's something I'm vaguely aware of. Most of these tools don't usually have like a fixed, "Ah, we keep 80 percent of the original pixels." They have these whole heuristic systems of like: well, human eyes usually can notice these kind of things. And where we cheat is on the margins of that.

Amos Wenger: Yeah, the biggest example, I think, is chroma subsampling. Essentially, you can afford to throw away some color information more easily. Like you can get away with that better than you can throwing away brightness, slash luminosity, slash luma information, because we we pay attention to contrast more than we do color. And that's the reason certain shades of red appear really poorly on any YouTube video, because most of the stuff we watch is YUV420P and yeah, certain calls just completely die. If you do screen recording in OBS and you watch it back and you're like, "Oh, the syntax highlighting is all wrong." That's why. The calls are changing. You're not crazy.

Let's look at some numbers

Amos Wenger: So we're looking at a slide that shows a table. The columns of format, parameter size and SSIM, structural similarity index, and the reference image is a PNG. It's a PNG screenshot that I took with CleanShot. And then, as I said, I transcode that to convert that to JPEG XL when dropping onto the website. And then from there, because that's my storage format, I convert to AVIF and WebP on demand. And we can see the sizes here. The JPEG XL version is 200 kilobytes. The AVIF version is 270 kilobytes. And the WebP version is 300 kilobytes.

And the SSIM are all... there's two ways to look at SSIM numbers. There's the number from zero to one and the decimal number, which is a log scale. And what's interesting is that higher is better. First of all, I don't know why this is we're going to get some emails, I guess. But the SSIM score for AVIF and WebP, which are re-encoded from JPEG XL, are somehow higher than the score for the JPEG XL one that is encoded directly from the PNG. So I have no idea why this is happening.

James Munns: Luck, I guess.

Amos Wenger: My theory as someone who is a complete fraud in this domain but has played around with a lot of things is that codecs like AVIF and WebP reproduce some artifacts that SSIM is used to. I don't know. This theory is probably completely wrong for SSIM, but not for the next one.

James Munnsr: It's a 'tagged for future research.'

Amos Wenger: But anyway, we get as far as decibels go, the JPEG XL one is 20.9 dB, AVIF is 22.9, well, 23, and WebP is 21.4. And anything like above 20 is extremely good. So those are almost visually lossless. And we're saving... in just 10 percent of the original size or even less. The choice I made here is that I'm going to be paying the cost of people not supporting JPEG XL. They're going to I'm going to serve slightly larger images for them so that they can get decent quality. But I don't need it to be completely lossless. I used to say, "Oh, for screenshots, I want PNG," but now ... there's no need really for JPEG XL. The quality.

This is my last slide, so I might have to move. But there's no quality setting for the JPEG XL encoder. There's a distance setting. A distance of zero is lossless. Distance of one is excellent. Two is still very, very good. And it goes all the way up to 15. The distance I chose is 5.5. And then there's also an effort, which is how long... how much computing it's willing to do to try and optimize the quality of the image. And it goes from one to 10 and the default is seven. And they all have nice little names. The number seven is squirrel.

James Munns: Oh, these like preset names or something?

Amos Wenger: Yeah, the effort levels. Effort level one is Falcon, if I remember correctly. They all have different animal names. For AVIF you have the boring quality from zero to 100 and also a speed one, which I'm assuming is the reverse of the effort. Like effort, the higher the effort, the longer it takes to encode. But speed is the higher- the speedier it is. Jesus fucking Christ. My cat is wreaking havoc on this episode. Meow, meow, meow. And then for WebP again, you have a boring zero to 100 quality. Meow. We need cat break. Meow in the mic, please. Meow. Exactly.

James Munns: Oh, we got some purrs.

Amos Wenger: Yay! So that's for images. But I've been, as I mentioned in other episodes, I've been doing this dual feature thing where I make a video and an article at the same time. So I make content that works as either. And the video sucks a little bit because you have to look at code and the article sucks a little bit because I can't show visual aspects just as well. And I'm fixing that last part by doing more screencaps or screencasts instead of screenshots. So I can show things that move. I can include little bits of video within the article in line. And that's harder than it seems because essentially what I used to do is just encode as H.264 with a reasonable bitrate. And that's supported basically everywhere.

But I was not happy with that because now I have this great image pipeline. So I was like, I want the same experience. I want to be able to grab some video. This slide is a photo I took of my screen with CleanShotX open in video recording mode to show you that you can just choose a rectangle to capture and then you can show clicks. You can record the mic, the computer audio. You can even add a webcam. It has the whole shebang and record a video and that gives you H.264. And again, it goes to a corner of the screen and then you can drag that... For me, you can drag it directly in the browser and it's going to upload. And then the mistake I made when I first built, well, V2 of my video platform is that I linked directly against ffmpeg and I dealt with raw images and I had to think about timestamps and whatnot.

This time around, I'm older. I'm wiser. I have less time. I just use the ffmpeg binary and there's a Rust crate called ffmpeg-sidecar, which does the hard work for you. It downloads ffmpeg if you don't have it. I do have it. I'm making a dock of containers, so I just added a build in there. But it also parses the output of ffmpeg into Rust structs. So if you're asking ffmpeg to output raw video frames, for example, it's going to be able to split those. It's going to tell you all the information about the different streams that are the input streams, the output streams. It's going to deal with pipes so you can pipe data into ffmpeg. So it's great. It's honestly, I don't know where it's been all the time. Maybe it's more recent than my other attempts.

James Munns: I've seen something similar for the other big one that gets mentioned a lot in these spaces is uh... cURL. And I think I've seen similar crates, which either... they're, they've seen one which bind to libcurl. And I think I've seen other ones that are like: do you really want no size in your binary? Because we will just do the cURL command line interface for things. And if you only need to make one REST request, it might be nicer than including a whole bunch into your crate itself.

Amos Wenger: It's funny because that feels nasty. But to me, it hurts my my delicate sensibilities because I like to link against everything. But specifically when I think of ffmpeg, I think having it run in a separate process is not the worst thing in the world. Because a lot of codec code is kind of wishy washy security wise and crash resistance wise. So some of it is basically untrusted user input. And if you allow people upload video, whatever. So if I allow the random people to upload things to my website, I would run that in sand boxes or even a VM. So yeah, a process is a good start. If it crashes, it's not going to bring down the entire website at least. Oh, and it's super easy to cancel. You just kill dash nine in the process. That's it. Send SIGKILL. Well, I sent SIGTERM first because I'm nice. Next up, we have a map of the world. James, can you guess what's on that map?

Servers around the world

James Munns: Is this the fasterthanlime content distribution network? That's exactly right. We have a control plane. So you've got West Coast US, East Coast US you've got UK, Germany. No, yeah, Germany. And where in Southeast Asia is that?

Amanda Majorowicz: Singapore!

Amos Wenger: Singapore. Yay!

James Munns: OK, excellent.

Amos Wenger: Ooh, Amanda.

Amanda Majorowicz: I was really good at geography.

Amos Wenger: I was not. I had to look all these up. I'm deadly... deathly afraid that I got one of them wrong and we're going to get mail about it. But yeah, we have the slightly pinkish one is the control plane. It's a dedicated Hetzner server in Falkenstein, Germany. And then the edge nodes, we have one in Hillsborough, Oregon on the West Coast. Ashburn, Virginia on the East Coast. And then in Digital Ocean Droplets, we have one in London and one in Singapore. And those machines have very different characteristics. The control plane, which is also like the Kubernetes cluster... surely they figured out an other word than 'Master' by now. I don't know. It's the main node. It has 64 gigs of RAM. It has 20 cores. It's a 13th generation core i5, so it's an i5-13500. And it has two 512 gigabytes NVME SSDs that are in RAID 0 because life is short.

James Munns: Got to go fast.

Amos Wenger: Got to go fast. James, can you explain?

James Munns: RAID is the redundant array of inexpensive drives, and RAID 0 means you...

Amos Wenger: Discs, but yes.

James Munns: What is it?

Amos Wenger: I think it's discs, not drives.

James Munns: Oh, is it?

Amos Wenger: No. Did I just... I'm going to...

James Munns: I don't know. I think it's been backronymed anyway, so it's whatever. But yeah, I mean, RAID 0 is the... I have no... Like, RAID 1 means that you have two copies of the same thing. So if one drive fails, you're fine. And it also usually sometimes is faster because it means you can pull from both drives at the same time of the same content. RAID 0 means smash them together as if they were one really big drive. And so if one of them fails, it's like your whole thing fails instead of just losing half or none.

Amos Wenger: I was right, and it is disc, so it makes me feel slightly less bad about interrupting you. And also, RAID 0 is not just about having a device that's twice as large instead of having two devices. It's also about striping. So when you're writing, you have- not quite, but almost double the writing speed, honestly, because it's writing both at the same time.

James Munns: Does RAID 0 do striping?

Amos Wenger: Yes, it does.

James Munns: Oh, OK.

Amos Wenger: Last time I checked. Let me check again.

James Munns: My home, my home NAS has two pairs of spinning metal discs. Each of them are paired. So I needed a name for each of them. So I named each of them after bicycle manufacturing companies. So I have a... Huffy and VanMoof are the name of my two pairs of spinning metal discs.

Amos Wenger: VanMoof.

James Munns: I don't know if VanMoof is still solvent as a business anymore, but they had cool looking bikes.

Amos Wenger: OK, yeah. RAID 0 is literally called striping. RAID 1 is mirroring. RAID 5 is striping with parity. 6 is striping with double parity. And then RAID 10 is combining mirroring and striping. I don't have enough discs to make that work. Hetzner has a lot of these with just two discs, and then you have to choose if you want to lift interest or not. But all my... All the important data is backed up in multiple different places. I do have object storage running. I have MinIO running on that machine, but it's all backed up elsewhere daily. So I have tested recovery. I'm pretty happy about that.

James Munns: The OneVariable website is a single ARM instance running in Germany.

Amos Wenger: Don't tell people that!

James Munns: It's the little baby... It's a serving a static site. I don't do any video pipelining or anything.

Amos Wenger: They'll DDoS you, James!

James Munns: OK.

Amos Wenger: They'll rip your nginx to shreds!

James Munns: I'll buy a second four euro a month. instance.

Amos Wenger: Bankrupt you in no time. So, yeah, the control node is pretty beefy. 64 gigs of RAM, 20 cores. I like it. And then the edge nodes only have two gigs of RAM and two cores, two vCPUs and 60 gigs of SSD, quote unquote, because I don't trust VPS providers. OK! I've looked at io performance for a bunch of them and they say, "Yeah, it's SSD storage!" but it's very different from local SSD. Some providers actually have local SSD versus SSD via cloud storage, virtualized file systems, whatever. So that's why SSD is in scare quotes.

James Munns: It's an SSD, but there are 30 people using it at the exact same time.

Amos Wenger: Exactly. So I do not want to be running video transcoding at the edge. I actually, up until now, I was fixing it, but then we had recording. Image transcoding was running at the edge on those tiny little machines because it's image transcoding. How bad could it be? Pretty bad. Pretty bad. The modern codecs need a bunch of computing power if you want to have both quality and small sizes. But video transcoding is even worse. Just a 20 second video encoding to VP9. Libvpx is so slow compared to SVT-AV1. So I want to run this on the central node. So I'm not going to bore you with the detail. But yeah, you need the job queue. You need the ferry data back and forth.

Which "storage" format?

Amos Wenger: And also I need to, again, choose which storage format I'm going to use. So for images, I decided not to use PNG because it's much too large. I went for something lossy, but still visually good enough. And same question for video. There are lossless video formats. There's a bunch of them. Some of the most well-known ones are FFV1, HuffYUV, and Lagarith. Some are designed for storage.

James Munns: Wow, I have heard neither of those.

Amos Wenger: Well, if everyone uses the ffmpeg codec one or something, this one's actually derived from something you know. The variants of ProRes that are visually lossless kind of, H.265 and AV1 have lossless modes. For different use cases, whether you want to store them, use them for editing, so you want fast seeking and whatnot. Or you have forever time to spend on compressing them and you won't, I don't know. Consumers ask for lossless streaming. Very few platforms offer lossless anything. I think, I don't remember what the story was with TIDAL. TIDAL, does it actually do it? Spotify said it would. Does TIDAL do video at all? No, no, no, no, no, but for audio.

James Munns: The only one I can think of is Bandcamp, but that's not streaming. That's downloading lossless files.

Amos Wenger: Yeah. So what storage format do we want for this? AVC is the name you should use instead of H.264. Yes, but no, is my answer to this. It is universally supported. It's hardware accelerated everywhere. Your phone has an H.264 -and has had for years. But some patents haven't expired yet. Some have and some haven't. Some are going to expire in 2030 in the US. It differs by country. Some of the companies holding some of the patents in the pool have refused to release them. It's unclear. There's open H.264. Cisco has paid the royalties. There's a bunch of things happening. I just, I want none of this. Also, there's a large allowance. Like if you make less than this amount or you have less than this amount of subscribers or all the video is free, then it's fine. But I have patrons that get early access to my articles.

So is that paid subscription? Is it free? Is it web streaming? Who knows? I just, I don't want to touch it. Also, it's pretty old now. HEVC is the successor to AVC. It's also known as H.265. Super no. HEVC came out and it was better than AVC in every aspect, except royalties were much, much worse. Much more expensive. It was so bad that the entire industry came together and made AV1. They all donated patents and then new technologies. They put all their research products together and they made AV1 because I was like, no, there's no way we're paying the royalties for HEVC. I'll put more on that later. VP9? No, I just don't like it. Also, it's old. I'm not going to use it as a storage format. It's just bad. So I'm using AV1. Like I said, HEVC was announced in July 2015. And then in September 2015, Amazon, Cisco, Google, Intel, Microsoft, Mozilla and Netflix announced the creation of the Open Media Alliance. They all immediately got on the phone and were like, we are not paying this. Nobody is paying this.

James Munns: Who's H.264? Is that Fraunhofer and all them or is that a different alliance?

Amos Wenger: It's the MPEG LA.

James Munns: Okay.

Amos Wenger: So I'm assuming it stands for Los Angeles. No, I don't ...

James Munns: Licensing authority, maybe. I don't know.

Amos Wenger: Probably.

James Munns: That reminds me a lot of stuff like Let's Encrypt where you have like this industry that charges a lot of money and a big group of companies just go, nuh-uh. Like, we're kind of done with this. We're going to be doing this one. It's free.

Amos Wenger: Exactly. Does Xiph X-I-P-H? Do you remember? Did you already?

James Munns: I know them. Yeah, they're the people who do like Ogg, Vorbis and Speex and--

Amos Wenger: And Opus and Theora and Daala, which was based on the wavelet transforms, which I will talk about in like six videos. Geez. So they donated Daala to that effort. Google donated their VP10 draft. Cisco donated Thor. I don't know what that is. And they just kind of mixed everything up and ta-da. You get AV1. And AV1 is very good.

Let's look at some numbers again

Amos Wenger: Let's look at some numbers again. One is a slow screen capture. So it's just me dragging something from Finder into one of my apps. The original file from CleanShot is 2.1 megabytes. And this time, we're not using SSIM because that's for static images. We're using VMAF, which is Video Multimethad Assessment Fusion. And it's been developed by the University of Southern California and a lab from Nantes in France. And another lab at the University of Texas, Austin. Fun fact, it's the second time in 20 years that universities got an Emmy Award. And the first time a French university got one. They got an Emmy Award in 2021 for their work.

James Munns: Interesting.

Amos Wenger: It's a totally different way of measuring video quality that is supposed to be much closer to what humans think of video quality. So it's a model. I don't know if there's weights you have to download. I know it's a model. And it has surprising results because if you're looking at the table right now on the slide, you will see that the input footage compared against itself only gets a 97.4, which is surprising. I expected a 100, but I looked it up online. And that makes sense. No, the best score is not necessarily 100. It's all probabilistic stuff. You can see that AV1, which comes in at 750 kilobytes against the 2.1 megabyte original, is at 97.15. And VP9 at 1.1 megabytes is coming up at 96.57. So they're all getting pretty good scores. Again, AV1 is more compact with a higher image quality. It's really good.

James Munns: I'm still using MP4s everywhere, which I guess is just-- it's not even a container format. So I guess I'm just still--

Amos Wenger: MP4 is the container, yeah.

James Munns: Yeah. I don't know what codec it's actually using. Whatever OBS puts out, I guess.

Amos Wenger: It likely is AVC, yeah. I haven't even talked about containers in this presentation, in this episode. But basically, yeah, I shove AV1 in MP4, and I shove VP9 in WebM because that's pretty much the only option for browsers. You can technically do the two other combinations, but browsers go nah. Also, MP4 is a misnomer. MP4 is based off of MOV. And now the proper term is ISO BMFF container format. It's complicated.

As far as AV1 goes, the reference encoder called libAOM, from the Open Media Alliance, is really, really slow. It was really, really slow. Much slower than even HEVC, so H.265 encoders. And that was an issue. Luckily, there have been competing implementations. There's Rav1e in Rust, Rav1e with the I being a 1. And there's SVT-AV1 from Intel, which I'm using here because it's super fast and super good. I'm using crf 35, and it's divided in tiles. So tile columns 2, tile rows 2, which means it's divided in a 2 by 2 grid. And tiles are encoded separately.

James Munns: Is that so you can have things like if you have screen graphics in one, you might have higher density of pixels where some screen graphics are, or if there's more action or something in one corner or something like that?

Amos Wenger: That's a good theory. But in this case, it's just so that it encodes faster. You can be more parallel because a lot of it is very sequential. So it's just like you're encoding four different videos that have the resolution or a quarter of the resolution. For VP9, kind of the same story. I'm also using crf because when you're encoding video, you can aim for a certain bitrate or you can aim for a certain quality. And in this case, for VP9, I'm specifying video bandwidth of 0, which switches to constant quality mode and setting crf to 24. And those are pretty... like I don't like to wait.

I should do better. I should have these done like this. And then I should schedule a better one with like more effort, higher crf or whatever, lower preset rate if you want. So they'd spend more time and optimizes them. I suspect YouTube does this. They might. Because it makes sense at first when a video just comes out, you want the encode to be finished quickly. So maybe you're willing for it to be a little bigger or the quality to be a little worse. And then you do a second encode that might take longer, but it's going to save you a lot of money in the long run. And I guess save on bandwidth. So I might do that. I just nerd sniped myself. Damn it. I need to be finished with this already.

James Munns: Congrats.

Amos Wenger: Second sample. It's a fast screen capture. It's Fasttracker 2. So look out for that in my video. This time the original against itself gets a 98.9. Pretty good. And the other two as well. The original is 1.5 megabytes. AV1 is only 372 kilobytes. And VP9 is 928 kilobytes. And then my third sample is iPhone slow motion footage of me strumming a guitar. And that time the source footage is an HEVC, because that's what my iPhone gives me. And the original is in 26 megabytes with a VMAF score of 99.3. And this time we can see the compression. AV1 is going to one tenth of the size with 2.8 megabytes. And only gets a score of 93.8. And VP9 doesn't reduce the size nearly as much. It's 7.4 versus 2.8 for AV1. And it only gets a score of 92.8. So I hope from this you can see why I don't like VP9. It's just older. It just has-- I don't know. It's from a different time. Not as different a time as, you know, LaTeX. But it's from a different time nonetheless. It's bigger and worse. We are reaching the end of my presentation soon.

I wanted to show you something funny...

Amos Wenger: I wanted to show you something funny. So when you embed video on the web, you use the video tag. I've showed the picture tag with source sub tags. So you can choose from different formats. Same thing with video. You have video and then source inside. But what shows before you click play? The answer to that depends on what you put in the markup. You can have preload full. And then it's going to download enough of the video to render the first frame. And then it's going to show the first frame.

You can preload metadata. And then it's just going to get enough of the video to know the duration and the size, but not necessarily the first frame, I believe. Or you can have preload none, which I chose. And then it shows nothing. It shows an empty thing. So it's a good thing that I'm grabbing the dimensions of the video and adding that in there. So the hole is at least the right size. But I wasn't happy with that. I wanted to have some cover, like some images. I wanted the first frame without actually having to download the video. Because when people click play, it actually requests the video in my trigger transcoding. So yeah. So easy. You just tell ffmpeg to grab the first frame and encode it as something modern. I have already picked my favorite image formats. So just do that, right? Well, wrong. James, what do you see in that image?

James Munns: I see tearing?

Amos Wenger: Not quite.

James Munns: What am I seeing in this image?

Amos Wenger: So you're seeing two halves of the player in different states. It is not tearing. It is divided down the middle. The right-hand side of the image is the cover image. It's the first frame encoded into an image format. And the left-hand side is the first frame of the video as rendered as a video, after you hit play.

James Munns: Ah, I gotcha.

Amos Wenger: And as you can see, the background color of the cover is much darker than the background color of the video player. At least I hope you can see that. And if you can't, I'm going to buy you a new screen.

James Munns: I thought they were just like two different-- I thought it was a screen re-rendering or something like that. But it's just two renders at the same image.

Amos Wenger: No, I had to do a little image editing to stitch those together. But basically what I did is I had four video players in a sequence: Safari, to have the screen and screenshot and then brought those together. So I have more examples of this. This is with Finder. You can see the grays are just all wrong. So because this isn't Safari, that means the video is VP9 and the poster is JPEG XL. Because Safari does support AV1, but only on devices that have hardware-accelerated AV1 decoding. So that means I think it's the iPhone 15 Pro.

James Munns: How far back? Yeah, I was going to say, how far back is that?

Amos Wenger: Yeah, and some recent MacBooks, like the M3 and something like that. So on my MacStudio M2, which is more than capable of doing that in software, it doesn't play AV1. And so I have to use VP9. And you can see the colors are really wrong. This is a fast tracker to clone screenshots. Again, everything is slightly wrong. And the guitar footage, this one is not aligned as well. It's probably just the next frame or whatever.

James Munns: So this is one of those things that I don't think I would-- if you showed me one and then you made me close my eyes and then showed another one, I don't think I would pick this up. It's very, very clear when you put them side by side.

Amos Wenger: Exactly.

James Munns: But I don't think-- you'd have to be Alt-tabbing between two different versions of it for me to be able to notice this.

Amos Wenger: Well, you know the reason it's noticeable is because it's the freaking cover image and then you click Play and it instantly switches to the video. And you can see the change in brightness almost immediately. And I wish this episode ended with me telling you how I fixed it. But I didn't fix it, James. I did not. I did my best. It used to be worse. This is a slide of things being much worse in Firefox. And it's hard to see because there are four-- like it's a tile. It's a 2 by 2 grid. But I have a digital color meter, which is built into MacOS showing that the RGB values for the background of my text editor in the video is 51, 51, 51. Those are red, green, and blue values in sRGB color space for Firefox.

But for all the other ones, it's 35, 35, 35, which is not in the picture. I just remember it off the top of my head. And apparently, I thought it was Firefox's fault because- in my defense, Firefox has been wrong about color a lot in the past. This is MacOS, so probably not the platform Firefox cares the most about. So I thought it was Firefox's fault. But then I actually looked at the files. I noticed something strange. If I first transcoded-- like if I generated a thumbnail from the VP9 file, the colors were more correct than if I generated them from the AV1 file, which is my storage format, which doesn't make sense at all.

Don't trust ffmpeg with AVIF

Amos Wenger: And pretty much the lesson here is don't use ffmpeg to generate static images. It is not made for that because AVIF images in AV1 videos have different ways of storing color, profile, color space, color primaries, et cetera, information. You need that to be able to decode and display the image accurately. And as a video, it's part of the MP4. It's in an MP4 box, I'm assuming. And as an image, it's in a different container that ffmpeg barely supports because they technically can export static images. But it's not really made for that. And the slide we're looking at is using avifdec --info and showing that the one generated with ffmpeg from AV1 has color primaries, transfer characteristics, and metrics coefficients set to 1 1 1. So it's just not setting them, I think. And the other one has 2 2 5.

So that explains it, apparently. But I didn't find a way to fix it. So I ended up having ffmpeg output, lossless JXL, and just reusing my existing setup to encode to AVIF and WebP. And I'm happy now. Nice. Just more fun things before we go. Everything I've shown-- so the picture tag and the video tag support sources so you can serve different formats. But the poster attribute on the video tag does not. So the image I show before the video loads, you cannot specify which image format you want. So that's why if you go to my website after this goes live and I actually have some videos up, you will see that I used the extension .thumb, which is not an image format. It's something that your browser will make a request to with a certain accept header, which is going to list all the things your browser supports.

And I could have an entire episode just about how to parse and interpret and tie break and negotiate the content type of what you're serving to a browser. But I've not done so for a very long time. I've tried to avoid it and I had no choice. And basically, it lets me serve JPEG XL to Safari and AVIF and WebP to everyone else from one URL. But at least I do redirect. So you get the real URL after you click. That's what people complain-- that's where it comes from. Most sites just have a .png and then serve WebP to browsers because they accept it. And that's my last showcase. Do you have any idea what this is, James?

A confusing graph

James Munns: The x-axis is hats and the y-axis is hats. Oh, no, it's going the wrong way. It's going down. So it can't be the hats by hats graph. No, I don't.

Amos Wenger: So-- Is it something you wear in your hat? What are you talking about?

James Munns: Oh, no, there's the Valve. It's the old joke. We'll either paste the picture here or cut this entirely-- of other companies. "Piracy! Sales go down. Oh, no." And it's like, but at Valve. And it's a graph and the line's just going up and the x-axis and the y-axis are both hats because this is when they were selling-- they were like the first one selling game stuff. We're probably going to cut this whole thing. If not, it's a very funny image. And it's what I-- whenever someone shows me a graph with no units, it's what I go. It's like, "MORE HATS!"

Amos Wenger: Classic encoders like JPEG, the first of its name, have this quality setting from 0 to 100. And so tools like ImageMagick also have a knob, a quality knob that goes from 0 to 100. But more modern formats like JPEG XL do not take a quality setting from 0 to 100. In this case, it takes a distance-- Butteraugli, I don't know how to pronounce that. It's a certain way to measure the distance between two images. And you give it a target, essentially. It accepts from 1 to 15.

But now that I'm looking at this graph, it doesn't really make sense, does it? Anyway, this is how ImageMagick maps its 0 to 100 quality parameter to the distance that it gives to libjxl. Hard code 0 is equal to 1, but that later in the code changes it to just toggling on the lossless flag. And it looks just so funny. It's log e until 30, and then it goes linear. It's weird. And then it flattens out at 100. It's just-- I don't know. I saw this image, and I was like, "Ah... I need a nap." Because I was using-- I am using ImageMagick to the initial JPEG XL conversion, and I was trying to figure out: OK what's the distance parameter that's actually passing to the encoder? So I went and looked at the sources of ImageMagick and looked at the function they used to convert to that. And then someone graphed it online, because I was like: does that look weird to anyone else? Why does it go all the way up to 20 or 40? It's not even a valid input to the encoder!

Wish me luck

Amos Wenger: Oh, well. Anyway, that's my last slide. Wish me luck. I'm not even sure... The problem I have now is that you know how I have H.264 that CleanShot gives me, and then I convert it to AV1? Now I'm doubting this. I'm not sure this is not messing with the colors. That would make a lot of sense.

James Munns: Next, you're going to go work at CleanFeed, implement AV1 direct recording, and then quit immediately.

Amos Wenger: James, I have emailed them. They've already replied. They're going to say, "We'll consider it."

James Munns: OK.

Amos Wenger: I've already told them: please let me pick a codec. I know I'm the only one who cares about this.

James Munns: That's the corporate version of "PRs welcome."

Amos Wenger: "I like you really much. I like you a lot. Please add this feature." And they were like, "We are so glad you like it. We'll think about it."

James Munns: Well, Amos: Bon chance.

Amos Wenger: Merci.

This episode is sponsored by Depot: the build acceleration platform that's on a mission to make all builds near instant. If you're tired of watching your builds in GitHub Actions crawl like the modern-day equivalent of paint drying, give Depot's GitHub Actions runners a try. They’re up to 10x faster, with unlimited concurrency, faster caching, support for Linux, macOS, and Windows, and they plug right into other Depot optimizations like accelerated container image builds and remote caching for Bazel, Turborepo, Gradle, and more.

Depot was built by developers who were tired of wasting time waiting on builds instead of shipping. It's made for teams that want to move faster and stay focused on what actually matters.

That’s why companies like PostHog use Depot to cut build times from over 3 hours to just 3 minutes, saving tens of thousands of build hours every week.

Start your free 7-day trial at depot.dev and let them know we sent you.

HTML5 video

Video

Audio

Show Notes

Transcript

HTML5 Video