automating macOS

RSS podcast badge Spotify podcast badge Apple podcast badge YouTube podcast badge

The quest for transparent DOM element screenshots

Join us as Amos falls down the rabbit hole of automation tools to bring you, the viewer, the absolute highest quality screenshots and code snippets.

View the presentation

Video

Audio

Download as M4A

Show Notes

Episode Sponsor: Depot

Transcript

Automating macOS

Amos Wenger: Hello everyone, how you doing today? Hi, Amos. Hi, James. I'm doing pretty good. Hi, Amanda, thanks for the thumbs up and the wave. Today we're gonna talk about automating Mac OS, subtitle, the quest for transparent, DOM element screenshots. So as you can see, I'm chasing cheap clicks for this season. I'm shooting for something that I know everyone will be into.

James Munns: Does this mean we get a pogger's face on the--

Amos Wenger: We absolutely do, James.

James Munns: Excellent,

Amos Wenger: I'll get right on it. So this is a thinly veiled promotion for myself. I've been making articles and videos at the same time lately because I have two audiences. I have people who read my articles, I have people who watch me on YouTube. And so far I've had to choose whenever there's something interesting, whether I'm gonna make an article about it or a video about it. And making videos takes forever. I need to do the research anyway, and I need to, visuals for the articles usually make diagrams and everything. So it's annoying. And also I used to write video scripts in something like Google Drive so I can share them and get feedback in line and then I can collaborate with other people. But I really missed just writing the way I write my articles which is with a code editor like Zed, just write markdown. And I made a really nice surfer for myself. You can drag and drop images directly in the browser. There's this nice syntax highlighting that I've already talked about in a recent video. And there's gonna be a bunch of visuals for this podcast episode. I'm sorry. I apologize for the people who think podcasts are mostly audio. They're not. You should watch--

James Munns: Where can they find those visuals?

Amos Wenger: They can find those visuals on SDR-podcast.com slash episodes if you want to. There's a link. All right, I don't need to say slash episodes all the time. But the point is yeah, you can get the PDF, you can get the slides, or you can watch those on YouTube where we might get ad money at some point. And you'll have those timed for you so you won't even have to flip them. The first visual is of course, a picture of my blog, which I redesigned a couple months ago, I think. And it's the article version of catching up with Async Rust, which you need to be a patron to read fully right now. But then also that particular one was a double feature. I released a video at the same time that I released the article, and the video is on YouTube. So it's available for everyone to watch right now. And so yeah, for six months, only the article video is only for sponsors and that's kind of my new business model. And it's been kind of working for me. But it's a lot of work to do both, not quite twice as much.

James Munns: Especially when you put so much production into both of them. I would say you have some of the cleanest videos and articles that I've seen. You're really,

Amos Wenger: James.

James Munns: Making me feel bad. Cause for me, writings are usually in a fever dream. If someone poked my brain in just the right way that all the neurons lined up and I just vomit all the ideas into a blog post at once, and then I have to get it all out in one sitting, or otherwise I sit on the idea for too long. Whereas yours are like a crafted work of art.

Amos Wenger: It's, well this is completely off topic and I have 61 slides. But yeah, it's been hard to get into a mode where I come back to older drafts and I'm able to iterate on them. Actually all the content I'm releasing early, like in Q1 2025, are old pieces that I've been drafting throughout 2024. But due to unforeseen life stuff happening, I couldn't finish any of them. And I just had this chains of dependencies. Like if I improve my blog to do this, then I can do that. Oh, but to talk about this, and you talk about that. And I just had like five or six different drafts chained up to each other. I got out of the hole recently and I started releasing stuff again. And now I have a kind of a monthly schedule.

James Munns: It's like the opposite of bankruptcy where now you're just delivering everything at once.

Amos Wenger: Yes, but I'm excited to finish cleaning up my old drafts and actually write some fresh material again. Because for me, it's old news. Some of those drafts were 11 months old, so I'm bored of them. But anyway, so this is the article, this is the video. And how do you do that video? If you'll notice there's a little bit of code in the video. It's kind of on purpose, I show code in videos because yeah, it's awkward. So pay for the article. That's the whole thing. You can still pause, you can zoom in. You can like, I try to make it big enough. But I want it to be clean. The gray background that you see is not a gray background. It's a black piece of clothing behind me. And with the color grading, it's almost uniform, but not quite. And so the code that's on there is composited. It's a transparent PNG. It's composited with the background. You'll notice there's no outline, there's no background around the code sample.

Stragtegy 1: Rectangle screenshots

Amos Wenger: It's not just the rectangle screenshot, which is what I used to do. So I used to just use initially the built-in macOS screenshot tool and then CleanShot X, which is, "Hey, CleanShot, please sponsor me. I will sell the frick out of your product because I love it."

Amanda Majorowicz: Well, yeah, you already did to me. So good job.

Amos Wenger: I did. Do you like it?

Amanda Majorowicz: I mean, yeah, I think it's nice. Right. I can do stuff. I can record things.

Amos Wenger: So many nice little apps on Mac. It's like 20 bucks a pop. So you take a rectangle and then you open it in an app like Affinity Photo, for example, Affinity sponsor m — No, I don't like their product that much. They're just not subscription like Adobe. Then you just remove the background, but it's always bad. It's always bad because the text is already composited with some solid color. If it's something dark and you're putting it back on something dark, it's kind of okay. But you can see in this visual, you can choose the chroma tolerance. There's a bunch of settings you can adjust, but you're still gonna have some of those darker pixels because it had a gray background. We could see some red text and you can see the pixels on the outer edge are not just more transparent because of anti-aliasing. They're also darker and kind of gray instead of being red.

James Munns: I think the industry term for these are jaggies.

Amos Wenger: Yes, jagged edges. I think so too. Another technique you can do is directly inside of DaVinci Resolve. DaVinci Resolve integrates Fusion, which is their After Effects competitor. They're like, what is it even called? Special effects suite. It's a node-based compositing solution and it has a thing called a 3D keyer. The way it works is actually funny. I'm just now realizing because they make a color cube and then they measure the distance between the color you give it. And like, you define an area within the cube that is like, I wanna remove that. And then it goes, okay. And so you can select a color space. Colors are fascinating. I'm gonna write about it eventually. I've tried, but the draft is like two hours long and I'm like, I'm not sure about half of this. Yeah, you can do it directly in DaVinci Resolve, but then you have to do this for everything. And back then I didn't know you could copy Fusion composition from one clip to the other. I didn't know about a lot of things. So it was a lot of just like waiting for the tab to open, adding a new node, dragging on the background, changing the sensitivity, dragging the Despill slider up, noticing that if you do that on a screen of GitHub, for example, it'll make the buttons hollow. Like if you go too far with the slider, it'll start removing. So yeah, so this only really works if you take screenshots of dark backgrounds and put it back on another dark background. And even then, if you look closely, which I always do, then it's not great. I have a bunch of, well, the results are not that bad, as you can see on the slides that you can find on SDR-podcast.com slash episodes. But as you can see, if you put it on a white background, there's a lot of, not just jagged edges. You can see it gets darker near the edge of the ladder and then goes straight to pure white background and it's pretty bad. So this is not what we want, but eventually I discovered in my journey of just replacing everything with Apple products because I can finally afford them and they work most of the time, that in Safari, you can right-click inspect element and from the dev tools, right-click an element and do capture screenshots. I'm not sure if Chrome offers the same thing. It probably does. Their web engines have kind of diverged, so I don't know who has what feature. But what it does is give you a clean PNG that's not composited with any background. And then I did the compositing test on this slide as well. You can see it on a white background and on a dark background. Well, it's a dark syntax highlighting scheme, so it doesn't really work on white, but there's none of those edges that we saw before. So if this weren't an auditory medium, this would be a great demonstration. Again, you should really just go on YouTube or find the slides for yourself to see the difference because it's quite striking.

Can we automate this?

Amos Wenger: But then the question is, can we automate this? Because it's neat that I found a way. What I ended up doing is that I now have a production tracker. So I have columns for everything I need to do for an article. And the first column is research and write everything, and then shoot everything, and then do a rough cut, and then collect material, visuals that I'm going to put into a video. So I have so many different columns. At the end, I have manual subtitling. I have, well, correcting auto subtitles. Done with Whisper, whatever. And now I just had a column for collecting all of those. I would open up the article of my blog and just scroll through the page. And then just go through the dev tool and click capture screenshot one by one and put them all into a folder. That's not great, but it's still better than switching back and forth between being in timeline editing mode in DaVinci Resolve and going back to Safari to capture more things.

James Munns: It's a very sit down with a beverage in a podcast kind of thing. Okay, this is the next 20 minutes of my life is to just sit down and do this. Absolutely, yes.

Amos Wenger: And you can mess up because if you accidentally select part of it, then the selection, the highlights will be also part of the screenshot, which has happened to me many times. Or if you select the slightly the wrong node, just like the parent or a child, you're gonna get only one word of code or you're gonna get the whole page. Have you ever tried to import a PNG that's several 10,000s of pixel high into DaVinci Resolve? It's very unhappy.

James Munns: I can imagine.

Amos Wenger: It's pissy about it too. I don't know, it shows a dialogue with the GPU error 72 or something. It doesn't even tell you what's wrong. It's just

James Munns: computer said no.

Amos Wenger: You done goofed.

James Munns: Yeah.

Amos Wenger: You know what you did, mister. It's really bad. So can we automate this is the question I eventually asked myself because I was bored. I get bored. This is my issue. This is why I couldn't get things done is because I started doing manual work and I'm like, I hate this. This sucks. I cannot afford to pay someone to do this for me yet. And this is what saved me. Because if I had paid people to do the workflow that I used to do, say two years ago, I would have a team of 12 people doing repetitive work. That's bad. But instead, because I couldn't afford to, I had to just make the workflow better, which is just like when you're chasing a bug and you're looking at the wrong places, you're making the rest of the code so much nicer and more auditable, more debuggable, more tested, even if the bug was not there in the first place. So I iterated a bunch on this workflow. And one of my thoughts was, can we just make a Safari extension? Surely if you make a Safari extension, I understand that the webpage itself cannot call into the dev tools. That would be bad. Because through the dev tools, you can probably access private user data and like, control part of the OS. I understand there's sandboxing going on here. But as an extension, surely you're able to, because you're able to add a whole different tab to the dev tools, you can make your own dev tools for something like React or whatever front-end framework. So surely there's a way to control that screenshot functionality, but I couldn't find anything. Documentation around Safari DevTools is really sparse. Maybe I searched wrong, but I searched for half an hour and I kept finding like, no, there's nothing, good luck. My only solution so far was just the right click on a DOM node and choose Capture Screenshot. There is the only thing I could see in the public API, because I did eventually find the API docs is get screenshots of visible area with a completion handler or as an async function. I don't know how Swift works. But obviously this actually grabs the background as well, because it's screenshot a visible area. You give it just a rectangle and then it gives you everything that's in there. So that's not super helpful. That's not what I want. Then I thought about WebDriver. I don't know if you've ever written tests for the front-end of a web application. It's not a good time, but you have tools. It used to be a super hack, just like, well, the Chrome DevTools, we kind of know what the protocol is. We've kind of reversed it or something, or it's all open source anyway. Someone's read the Chromium sources. What if we have some test suite send the same thing that the Chrome DevTools would, and it kind of simulates clicking around the webpage and typing tests and whatnot and waiting for elements to appear and click on them. I think it's still a bit of a mess. But there's a standard now, so you have a reason to be mad about it now. The WebDriver standard includes a take element screenshots, HTTP method. But again, it takes a rectangle, a region to capture, and so it does include the background. And it's not like it's a bug. I can't report this to the browser maker. They're just following the standard. They can't unilaterally decide to make that transparent.

James Munns: I was gonna say, for most people, that's usually what they want. The way I've seen this before is for regression testing and things like that, where usually that's what you want in a bug report, is you're like, "I want this chunk of the thing "where there's supposed to be a log and element, "and there's not one, or it's sideways or upside down now, "and I wanna see the whole thing where, I don't know."

Amos Wenger: I know, I'm the weird one out.

James Munns: I can't think of anything else that you would want, you know what I mean? Other than the fact that that was, you found, you happened to find in one minute, you were like, "It does exactly what I want, "almost accidentally."

Amos Wenger: This is probably the stage of the episode where people are asking, "Wait a minute, Amos is using this "to export some code blocks essentially." So I'm using Safari to render just colored text essentially. I could do that elsewhere, right? I could write a bit of code that takes the thing, run the tree sitter, gives me the colors and whatnot.

James Munns: Well-- Amos, where should people send their suggestions of how they think your workflow should be? Definitely.

: (Both Laughing)

Amos Wenger: But I'm also using it for diagrams. Rendering SVG is not as easy as it sounds, especially because of rich text formatting, it's actually HTML elements inside of SVG. So the short version is I'm not just doing code, I'm also doing things that I could technically do outside of browsers, but I really don't want to. And I like things looking exactly the way they look on the web. So, next option, build webkit slash Chromium yourself. The problem is--

James Munns: You know, I was gonna give you credit because I said it's probably easier to figure out how to make the browser do something that you wanna do than write your whole engine. Like I totally understand, you know, what kind of yak shaving would that be to build your whole rendering engine? I know other people in Rust who do that, but I was gonna give you huge credit, like you gotta draw the line somewhere and just drive the thing that exists. And then you drop a slide that says, build webkit or Chromium yourself.

Amos Wenger: No, no, no, no, no, no, no, no.

James Munns: I'm gonna--

Amos Wenger: You misunderstood. I mean compile from their sources. I mean not implement.

James Munns: No, I still, I feel like there's patches coming in that I'm about to have opinions on.

Amos Wenger: There aren't, I don't know. Cause I don't even know if they have the functionality. I booted up Chromium, I looked for APIs, they weren't anything automatable. So like, even if you can do it from their dev tools, I'm not interested cause I'm at the same point essentially. But yeah, building Chromium takes a long time. It's worse than building LLVM. Working on Chromium is one of the reasons that Google has a bunch of different build tools and build farms and whatnot. It's just gigantic. There's so many dependencies. I'm not sure I have enough RAM in this house to build it honestly. And the human average age expectancy is like 80 something. So I, you know, this is just not an option for me. If it was for a commercial project, maybe. But for me, I looked for something else. And then I remembered that I'm on Apple now. I'm not switching back and forth between windows and my MacBook. Everything's on macOS. So they have a tool called Automator, which looks like this. You can find this slide, say it, no, I'll stop. sdr-podcast.com slash episodes. It's a little app that lets you build workflows. And I recently actually built a simple workflow that I've been wanting forever. I didn't realize you could just use Automator to do that. It's a quick action so I can right click a PDF in the finder and render each page as a separate PNG. That's actually something I do a lot while editing videos because I made a video where I talk about the NSA's Memory Safety Report, for example. It's just seven pages. What I used to do is open it in Preview, make it big enough so that the resolution is kind of OK, take screenshots, and then cut everything. It's just bad. Or you can just figure out what the command line is and go through several tools and blah, blah, blah, blah. I hated that. And also, Preview has an Export as PNG option, but it exports an Animate PNG, an APNG. It does. James, you're shaking your head, but I swear to God, it does.

James Munns: No, I believe you. It's a confused face. Like, oh, God, why?

Amos Wenger: I believe you. I expected an option like export each page as an individual PNG, but no. However, they do have a built-in render PDF pages as images, hidden in Automator, which everyone keeps forgetting about. Even on Apple forums, they're like, oh, right. That's the thing. Forgot. So Automator is fun. You, the listener, may have not have heard of Automator before, but you may have heard of Shortcuts, which Apple teaches you about every major version of iOS. So this is that, but for Mac OS, pretty much. And you can record something. So you can hit the Record button, then do something on your computer, and then play it back, and it'll do the thing. But that doesn't work for me, really, because for me, it's opening the DevTools and then scrolling through each element that matches the selector. Like, I don't think-- I don't know exactly what it records. Is it going through the element hierarchy? Is it based on screen position? It just doesn't work for me.

AppleScript!

Amos Wenger: Enter AppleScript. Yay. So initially, what motivated me to actually make that slide, make that presentation here, was that AppleScript is freaking weird. So we're going to talk about AppleScript for a hot minute. It's a language, a natural language, developed by Apple, that first appeared 32 years ago. I'm older than AppleScript, but not by much. I was born in 1990. It first appeared in 1993. Typing discipline is weak. I'm reading from the Wikipedia info box. Typing discipline is weak, of course. It appeared in System 7, apparently. I have never, in my life, run System 7. I've never run Mac OS 8. I've seen screenshots of Mac OS 9. I'm pretty sure it ran in my browser or something. No. Yeah. And it is extremely weird. Let's go through some of the things. AppleScript was released in October 1993 as part of System 7.1.1, the first major upgrade to System 7. QuarkXPress-- I don't know if-- James, you recognize that product name?

James Munns: It rings a bell, but I never used it.

Amos Wenger: It's a layout thing. When you make a magazine, you need to do the layout of the thing? You would use that.

James Munns: I should ask Amanda, although if it's a journalism thing, maybe. But it's a very long ago. I don't think you were doing journalism in 93, Amanda. I

Amanda Majorowicz: definitely was not. And so I got--

Amos Wenger: I'm sure it's still relevant.

Amanda Majorowicz: Journalism, that is me. Wait. Oh, no. Not yet.

Amos Wenger: QuarkXPress was one of the first major software applications that supported AppleScript. This, in turn, led to AppleScript being widely adopted within the publishing and pre-press world, often tying together complex workflows. This was a key factor in retaining the Macintosh's dominant position in publishing and pre-press, even after QuarkXPress and other publishing applications were ported to Microsoft Windows. And I had no idea. I remember wondering as a kid, why do people buy Macs? Because it's so expensive. And now I have two decent reasons, at least. There's more. But one thing I love is you just click the Help menu and you start typing something out. And it tells you every menu item that exists in the application. That's pretty good. I used to complain that there's a global menu, essentially, depending on which application is focused, the global menu bar at the top changes. But I think it's great now, because you can just discover functionality. And DaVinci Resolve all the time, I'm clicking the element and going to Help and being like, is there a thing to do this? And it's either showing you the menu item or directing you to the Help page. So yeah, AppleScript lets you tell applications to do things. And apparently, that was enough for the publishing and pre-press industry. And instead of writing code to do their thing, they wrote this weird natural scripting language to press buttons and move files around. I don't know if you've ever seen the AppleScript editor before. But it looks like this. So James, can you describe what you're seeing on screen right now?

James Munns: Tell application Safari activate and tell. Oh, man, this is from a time.

Amos Wenger: It is.

James Munns: System event Safari, JoJo. So you go to Safari, go to the JavaScript console.

Amos Wenger: What stands out the most? Can you see a difference between the first block and the second block?

James Munns: One's purple?

Amos Wenger: Exactly. One's purple and monospace. As it turns out-- I have a slide about this-- the purple text is uncompiled. It's using Menlo regular, which is a monospace font. Everything's in size 12. And the other things are Verdana. So actually, the compiled code is sans serif, but it's not monospace. They're not kidding around with the natural language thing. They're like, no, it's just a document with some colors and italics. Don't worry about it. It's easy. It says, set split group to first UI element of target window whose role description is split group. It's not programming. It's just English. It's fine. You can do this.

James Munns: This is of that era of hypercard. And there's all those other ones where it was the people who took the right amount of drugs were programmers at that point, where they're like, everything is-- everything was sort of modeled as like, OK, we want interactive programs. And people aren't going to write programs and run them. It's going to be an interactive conversation with your computer. And there was this whole genre of--

Amos Wenger: It was low code before low code, yeah.

James Munns: Yeah. And Oberon and all of those kind of things came out of this same kind of era. And then it just kind of lost to most regular compile your application and run it and those traditional scripting type things. But there's this whole 90s genre of-- especially on Macs, where there was just this software that was like, people are going to write human language and it's going to work. And you should make sure that everything you can do with the GUI can be automated through this kind of thing.

Amos Wenger: I'm going to stop here because you're going to go through my entire slide. Just talking about how excited you are. I know nothing about hypercard, by the way. But I know I think it's one of the ancestors of Apple scripts or like the whole automated thing. Yeah. So yeah, so you write code and it shows up at monospace purple. And then you hit Save. And if you didn't make any mistakes, it highlights it. This is just-- what's the format named? I'm going to-- I want to say PDF, but that's not it. RTF. This is just RTF. You can copy and paste it into Safari, and you'll get the colors. I know because this is how I made the next slides. So how do you write Apple script? You tell applications to do things. In this case, you can tell application Safari to activate. What is activating an application? It's bringing its windows to the front, essentially, and having the top menu bar. It's focusing it, basically. But focusing is different because it's for Windows. So in my case, first I want to activate Safari. And oh, also, everything you can do with an application is actually-- there's metadata for that. There's definitions for that that are in the app bundle. Because in case you didn't know, on macOS.app-- Safari.app, for example-- are folders. They're app bundles or folders. So you can just go in there and look at what they have. And in the contents slash resources slash Safari.sdef file, it's just an XML file that defines-- lists everything you can do. So if, for example, they've bound that to other languages, and you can generate types and methods and whatnot, there's different definitions for everything you can do with any applications from those XML files, which is absolutely great. The one thing you can do is you can click menus. For example, I have here, tell application system events. This is how you do synthetic inputs, keyboard and mouse. Tell application process Safari, click menu item, quote, show JavaScript console, of menu, quote, develop, of menu bar one. So there's a lot of menu bar one, menu bar 37. It gets complicated. We'll get into it quickly. But this is how it works. I don't know what happens if you switch the system language, and then suddenly it's a "Montrez la console de développement JavaScript" instead of show JavaScript console. I don't know where we have to go.

James Munns: You have to try this now.

Amos Wenger: I will not. So that does an exercise to the here listener, listener.

Amanda Majorowicz: Maybe, yeah, maybe somebody else will try it.

Amos Wenger: Please report back as a YouTube comment. We need the engagement. So what else can you do? You can send keystrokes. There's a keystroke command, which in this case, I'm using to type JavaScript code into the DevTools. How very meta. I'm using one scripting language to type another scripting language into a DevTool of a browser. You can do key code to simulate a key press that isn't a string. So keystroke is for strings for typing text. Keycode 36, for example, is enter. And then you can wait because welcome to GUI automation. You will wait. You will do some things and try to estimate how soon after that it is safe to do the next thing. If you're lucky, you can actually query the state of the GUI and wait for something to pop up or something. But there's a lot of looping and waiting for something to pop up. And if it doesn't try the action again and undo and whatever, there's a lot of cleanup code. There's going to be a link to the code I ended up with. And it's very defensive.

a11y (accessibility)

Amos Wenger: And the thing that makes all this possible is accessibility. I have a slide that says A11y. If that's the first time you see it, it's just accessibility. It's A11 editors and then y. Just like we have I18n for internationalization and L10n for localization, which are different things. All of this is possible because-- I don't know which caused which. They both benefit from every user interface element being part of a tree that you can query. There are accessibility APIs where you can list what's going on with an application. So what we have here on this slide is the accessibility inspector that's built into macOS. And you have this little crosshair button that you can click. And then you can point at any UI element anywhere. And it's going to tell you exactly where it is in the UI hierarchy.

James Munns: Yeah, I've definitely seen similar things for Selenium tests where working with some people who are doing front-end testing but also accessibility things that companies have worked at where they said, usually if you're doing the right things for accessibility, it also makes automation and testing much easier because there's already anchors and metadata and stuff for all the things you would need for this web driver testing as well.

Amos Wenger: Yes, and everything has to have labels and roles and some description that can be used by a screen reader. There's a button on that accessibility inspector thing that is the Play button. And this reads what voiceover the macOS screen reader built in would read if that element were active. And then there's also little arrows, which is what happens when you navigate to the next of the previous element. And similarly, there's actions. So for a button, there's the press action. For a window, there's a raise or lower. Everything that you can do without a visual pointing device or something like that, without actually looking at the screen, you can control all of this by just listening and saying, yes, no, maybe, click, whatever. And so actually, what we can see in the accessibility inspector, you're missing out if you're not looking at the slides, where you can see the whole hierarchy starting from Safari, which is an application, and then catching up with Async Rust, the standard window. And then split group, tab group, group, group, scroll area, HTML content, console, tab panel, log, group, and then finally text. And it's 10 as the result of evaluating elems.length.

James Munns: Are you using Safari because it works the best with Apple Script, or does this work-- I assume probably everything works mostly the same in Firefox or Chrome as well?

Amos Wenger: I would hope so. I didn't check. I think all browsers should be pretty good with that, at least on Mac. I know the situation is complicated on Linux. But at least on Mac and Windows, I would assume that all browsers expose the DOM as this accessibility hierarchy as well, because you need to be able to navigate what's in the web page using the built-in OS screen reader. So I haven't checked. I'm using Safari because-- and I'm ashamed to admit that Safari is actually my primary browser, James.

James Munns: Nothing wrong with that, especially if you like battery life.

Amos Wenger: But coming out, y'all have something to see. I'm using Safari. I don't know. I don't like Chrome anymore. Chrome was cool when it came out. I'm going to move on. So you have that whole hierarchy that you see in the accessibility inspector. And then you can turn it into AppleScript with great efforts and, in my case, the help of one LLM. So you would set target window to first window, set split group to first UI element of target window, whose role description is split group. And then you keep going. You go down the hierarchy like that, hoping that it is actually the first UI element whose role description is split group. Or if you have multiple split groups, then you need to count them or you need to filter them by something else. And there's only so many fields you can filter by. Role description is a good one. But some just don't have a lot-- there's two groups. I don't know what those groups are for. It's just kind of fragile. If Safari releases a major update, I will most likely have to update my scripts. And then finally, at the end, we have log value of text element as integer. And then you can see that in the script editor is kind of an IDE. So when you hit Save, it compiles. We've seen it formats the code. And then also, it shows messages as you run the script. And this is kind of not great to go from that accessibility inspector to writing that code by hand. And there used to be a better way. It was called UI browser. It was a third-party application made by Bill Cheeseman, which-- OK. Yeah, Bill Cheeseman. Kind of famous independent Mac OS developer. He did a bunch of apps. And he made an app called UI browser. It kind of overlaps with the accessibility inspector. You can still-- you can inspect elements. You see the whole hierarchy. It's displayed in a slightly different format. But then there's a button where you can just generate Apple script from the thing that you pointed to. So in this case, it's get value of static text, blah, blah, blah. But you can see of group 24 of group 1, see, you still have to tune it a bit.

James Munns: On that last slide, that's very small talk. There's the small talk graphical browser. This is another one of those ones of the same contemporary era of that, where all of your functions would be compiled objects. And you get this browser of all of your functions. So if you wanted to call functions, you could browse through it almost like a docs.rs But you could click on it and drag it into your program and stuff like that. So it's-- Everything was inspectable. It smells very much of that age. It does.

Amos Wenger: Unfortunately, UI browser version 3 was end of life'd on October 17, 2022. So I missed the boat. You cannot buy it anymore. That's why my screenshot says free trial, because I don't have a license. I have the statement here, the current release will not be updated. The website will close. It will no longer be possible to download or purchase UI browser, and product support will no longer be available. UI browser has been a labor of love for me. It's sole developer for almost 20 years. Now that I'm 79 years old, it is time to bring this good work to a conclusion. Little cheese man. It's hard to be mad at Bill, honestly. 79? OK, you get to retire. Fine. From open source maintenance. Oh, it was not even open source, actually. Well, OK. So there's a UI browser for--

James Munns: That means he was 60 when he started making it. That's correct. And then retired at 79.

Amos Wenger: You know what? It's never too late. If you think you're too old, no, you're not. Except Bill. Bill's no. Bill gets to retire. You don't. As he retired, like, he stopped working on UI browser 3. He also released a partially complete port of UI browser to Swift from Objective-C, I imagine, called UI browser 4. And I actually compiled it and tested it. It's not functional. It's missing pretty important things. And from the day that it was open sourced, I haven't seen any major activity on forks or whatever. So I don't know what people are doing now. Actually, I do. That's a lie. Actually, what they're doing is they're just pirating the UI browser 3, because it still kind of works. So there's a file in library logs-- that's called k6 whatever-- that has the ROT-13 version of your full name in an XML dictionary. And you can just essentially remove that file. And it's like you never started the trial again. And that's what the forums say to do, because what are you going to do? You cannot buy the software anymore. The newer version doesn't work, because there's no support anyway. So it's not like they're missing out on sales. They're not willing to sell it anymore. So people have turned to piracy. Or just don't care about accessibility anymore. But the UI browser 3 works on Sequoia 15.2, but it's all crashy. So yeah, a bunch of slides showing UI browser 4. It looks nicer, but most of the buttons don't do anything.

Safari devtools "secrets"

Amos Wenger: It is at this point in my journey that I discovered that actually some things are exposed in Safari DevTools. There is, for example, a screenshot method that I didn't know about. Because I had a problem in my automation. You still need to somehow scroll through the DOM and right click on the node and select the menu item that's capture a screenshot. But actually, no, you don't have to do that. You can just type screenshot and pass a reference to a DOM element right there in the DevTools console. It gives an image with a checkerboard background, so you know it's transparent, which is great. But then you still have to right click it and save it as an image. But at least it's predictable. You can now write some JavaScript that is going to go-- what are we looking at?

James Munns: You can write some AppleScript that writes some JavaScript.

Amos Wenger: That's exactly it. That's exactly it. You write-- well, so what I ended up doing is the JavaScript part I just have as part of the JavaScript sources for my website so that I get type scripting and all the nice things. Also, just because it's slow to have AppleScript emit keyboard events to enter one by one character by character JavaScript code in your DevTools, it's just slow. So it is part of my bundle. And the AppleScript just calls a function and sets some globals essentially. And then we just iterate through every element that has to be screenshot. And this is the AppleScript that ends up doing it. You can see a bunch of syntax. Like string concatenation is using the & operator, for example. The most striking thing for me is that parentheses are only used for precedence or grouping or priority of-- for instance, there's the set last log item to last UI element of blah, blah, and then the parentheses. These actually don't do anything, as far as I can tell. But it's supposed to read as a sentence. It's like last UI element of log element whose role description is-- whose role description? Role description is in two words There's space between them. But it's the same color. You can see it's just the name of a property that just happens to have a space in there. This is not the final version of the code. It performs action ax show menu to trigger the context menu, then adds a little delay, and then does the down arrow key code and then enter. I found a better way after finishing that slide deck. But it's a lot of that. It's just like if you can find that menu and click Save Image, then it shows up at the dialogue. And it's already focused on the file name. So you can just type the file name and press Enter and hope that everything works. It took me forever to debug this. OK, I spend an entire day of running it. And it goes through 75 images. And then it breaks. Because it got in some weird state. Something took a little longer than it usually does. I had to do a lot more accessibility inspection just to make sure that the UI state is what I think it is. A lot of looping and delaying and retrying. And then after that, after all of that, I figured out that I never needed to write Apple Script in the first place. Because there's a thing called JXA, which is JavaScript for automation. Because of course, there's not just Apple Script, right? There's also JavaScript. There's also Perl, Python, Ruby, and TCL from TCL TK. Yeah. I'm quoting from Wikipedia here. "The Macintosh versions of Perl, Python, Ruby, and TCL all support native means of working with Apple Events without being OSA components. Also, JXA, so JavaScript for automation, also provides an Objective-C and C language foreign language interface." James, do you realize what that means?

James Munns: You're going to write Rust that FFI's into JXA?

Amos Wenger: No. No. But that means you can write an entire Apple native Mac OS application in Apple Script. And people have. And they've sold them. You can write the entire thing. Because you can call out to any Mac API from Apple Script. Because there's this Objective-C bridge.

James Munns: Are they all like automator tools and things like that?

Amos Wenger: Just a regular app. Because you can turn Apple Scripts into app. And you can call any Mac OS API from it. So you know. Because Objective-C is closer to scripting language than C. Let's say they already had this concept of objects and sending messages and being able to do reflection and iterating on properties and whatever. So you can do that from Apple Script as well. So they just made a bridge. Isn't that something?

James Munns: So when we publish this, what would you be most excited by? Like someone from the Safari team who tells you exactly how to do this out of the box? Someone who's like the last person maintaining Apple Script inside of Apple. Or Bill Cheeseman.

Amos Wenger: Oh, I would love an autograph from Bill. I don't know if he's still around. But Bill, if you're out there, hit me up.

James Munns: Let's not jinx it.

Amos Wenger: We can chat. Well, the reason I had this slide is that you would think that JXA is this old deprecated thing. Just like on Microsoft, you still have JScript, which is not exactly JavaScript.

James Munns: OK.

Amos Wenger: I don't know if you've ever-- sometimes you see-- when you run old-ish applications, you see dialogues pop up. And it's a script error. But that's because it's an old, weird version of JavaScript being run by a very old, outdated engine built into Windows that they ship because they have to-- backward compatibility. I think I've heard of that.

James Munns: I've never had to debug that before, so--

Amos Wenger: I assume JXA was like this because Apple Script seems somewhat maintained. I think, essentially, they have to. Because again, the freaking pre-press and publishing industry is reliant on it. It's a selling point of Macs. They can never retire it now. It's too late. But I thought JXA was just like, oh, they tried it in OS X Yosemite. Didn't really take off because it's annoying. You lose a lot of the niceties-- We're going to get into that-- of writing Apple Script. There's no IDE integration or anything. You can write JavaScript. You get access to the same thing. But you're completely on your own. But it is giving you the latest JavaScript feature set. But that Safari has. So you can run exactly the same code that you can run in Safari, which is actually great. So you get to do modern JavaScript. This is an example of part of the script written as JavaScript. You can see I tried to get some typing. There are some TypeScript types for the-- Yeah, I was going

James Munns: to say those look like TypeScript.

Amos Wenger: Yeah. Yeah. You can see there's a shebang up top to run it using OSA script, which is the command line interface. Because I've shown the script editor, which is a GUI. But you can also just have a plain text file somewhere and then pass it to OSA script. And it still works. It just compiles on-demand I guess. Because when you save from the script editor, it saves the compiled version alongside with the sources. But the point is, when you execute it, it's faster. It doesn't have to recompile it every time you execute it.

James Munns: It's like, what, what are they, Python, PyO files, that kind of thing? The first time you run it, it compiles it. And then it holds on to it. So you don't have to compile it again the second time you run it.

Amos Wenger: Something like that. Yeah, I guess so.

James Munns: Yeah. But 30 years before, I guess.

Amos Wenger: I'm assuming compiling Apple's script made a huge difference back when computers were a lot slower. And so you can see the difference. You're looking at this JavaScript. You can see we're doing essentially the same thing. You're clicking on a menu. You're just accessing properties. What's aggravating is that you cannot actually enumerate properties like you would on regular JavaScript objects. Because these are not JavaScript objects that exist. They're just proxies for something that exists. You can't go through all properties. You have to know what properties are named. And you have to try accessing something. And then it just throws if it doesn't exist. So you have a bunch of try-catch blocks. But you have all the same stuff, like keystroke, keycode, delay, clicking on menus. You're just indexing into things instead of saying item zero or item one. Oh, yeah, because Apple's script is one-based also, one-based indexing. But in JavaScript, it's zero-based because of course. And yeah, you don't get DevTools at all if you use JXA. It just says error minus 1728 cannot get object. It doesn't even tell you the line. I was not expecting a full backtrace. I'm not greedy. I just wanted to do the line of the script that actually barfed.

James Munns: Just a crumb of context, please.

Amos Wenger: It does not. Even if you run it directly from the script editor, which you can, you can actually open JavaScript in the script editor. Doesn't work. Doesn't tell you anything. My last couple of slides are just funny, cursed things that people are doing. Some people are running OSA script from JXA to run Apple script because there are some things that Apple script can do, but JXA cannot do. So if you're writing JXA, you can tell system events to do shell scripts, OSA script, Apple script, something. Basically, yeah, I don't know. It's apparently impossible in JXA to get the front window of the frontmost application. You can iterate through all the windows of the application, but you don't know which one is front. So you can run OSA script as an external process to run some Apple script and evaluate to something and read it back into it. It's a terrible hack. This is a stack overflow answer that has zero upvotes as it should. Oh, this is the thing I was talking about. It's Apple script, ObJc. It's a co-code development software framework called Apple script slash Objective C or ASOC. It's part of the Xcode package since Mac OS X, Mac OS X, no leopard. It allows Apple scripts to use co-code classes and methods directly. So yeah, you can make an entire app just with Apple script. Should you? I don't think so, but you can. This is a comment on some forum that I found very soothing as I spend the entire day debugging my automation. Someone says, "I started programming in 1962 and since then have written in about 15 languages, but this GUI scripting is the most vexing I've ever encountered." So I don't think this is the actual Tim Burton. That's just a pseudonym. But my heart goes out to you. You can also send me an autograph if you're listening to this.

I got it working

Amos Wenger: Anyway, I got it working. I'm going to have a link to my code just so you can see what it looks like because it's really weird. I just never heard of it before. I forgot automation was a thing. The accessibility thing is super powerful and I'm excited that so much of Mac OS is accessible. I haven't tried the full screen reader experience yet, like actually trying to get things done just with my eyes completely closed or something. But it gives me hope. I hope to get more into that because I've been meaning to get into accessibility stuff for a long time. And so this was kind of a soft intro for me for something that actually saves me a lot of time. So I got it working. It captured all the images. And then of course I noticed some typos and some diagrams. So I had to manually grab a bunch again. But it was super helpful. So yay.

James Munns: I was going to say, have you improved the accessibility of your website now to make it easier for your own scripting and automation purposes?

Amos Wenger: No, I think the accessibility of my website is not terrible. I've gotten some, because I have some readers who are using screen readers. So they give me feedback sometimes. When I changed the component, I had this part switcher for series that was not accessible. So I changed it. I added titles and descriptions and whatnot. When you prototype a website, you use the default browser controls for a bunch of things like buttons and drop downs and whatnot. And then if you replace that with React components, they might not be accessible. There are some components like the whole React ARIA component suite is designed to be accessible from the ground up. But if you're just throwing something together quickly, it's really easy to lose track of things that matter to non-sighted or low-sighted users.

James Munns: Very cool. It speaks to AppleScript. It strikes me as one of those tools that they've tried to kill for 20 years now. And none of the successors have reached the power of the original one of it was just the right amount of imperfect but useful that it still has existed since 93 or whatever.

Amos Wenger: It's hard for me to hate on it because it lowered the bar, which means more people get to do stuff. And that's beautiful. I think that's a great note to end on. I'm so sorry. I keep trying. I will just do fewer slides. I'm so sorry. I also have not hit the download button once during this whole recording. But I did now. So we're good.

Episode Sponsor

This episode is sponsored by Depot: the build acceleration platform that's on a mission to make all builds near instant. If you're tired of watching your builds in GitHub Actions crawl like the modern-day equivalent of paint drying, give Depot's GitHub Actions runners a try. They’re up to 10x faster, with unlimited concurrency, faster caching, support for Linux, macOS, and Windows, and they plug right into other Depot optimizations like accelerated container image builds and remote caching for Bazel, Turborepo, Gradle, and more.

Depot was built by developers who were tired of wasting time waiting on builds instead of shipping. It's made for teams that want to move faster and stay focused on what actually matters.

That’s why companies like PostHog use Depot to cut build times from over 3 hours to just 3 minutes, saving tens of thousands of build hours every week.

Start your free 7-day trial at depot.dev and let them know we sent you.