The Embedded Buddy System

James' cheat codes for low/mid volume + rapid embedded development

James shares why you might want to design embedded systems as a network of devices, instead of trying to cram everything into a single chip

View the presentation

Video

Audio

Download as M4A

Show Notes

Episode Sponsor: Poststation

Transcript

James Munns:

This week I invented a whole new term for something that I've been doing for years. It seemed like a good name. So, this week we're going to talk about "The Embedded Buddy System."

Amos Wenger: Ooh...

James Munns: The alternative title here is actually

"James' Cheat Codes for Low to Mid Volume and Rapid Embedded Development" because it's basically how I've done almost every customer project for certainly the last three years but probably more likely the last five or six years because I end up helping a lot of companies that are either startups so they're making their first batch of hardware where they're shipping the first thousand units or they're shipping the first 10 units so they can get feedback from people and things like that. So they're not necessarily at the point where they're making a million items yet.

And a lot of what they're doing is either trying to figure out what their device should do, or for more specialized people, they're never going to ship a million units because it's some huge piece of scientific or industrial equipment.

And if they sell 20 a year, they're going to be pretty pumped because they cost six or seven digits a piece. I help out a lot with those kinds of problems.

If you're building something embedded today and you're designing it from scratch, you're not iterating on something. You don't have some really ridiculous constraint on size or anything like that,

you should probably consider this buddy system because I know you're going to immediately have some- I'm talking to the embedded developers out there- you're going to immediately have some pushback like, "Ah, this certainly doesn't address my needs." And for a very few of you, you will be very, very right. But more of you than maybe you'd think. So keep an open mind on this one, embedded developers and stick with me on this.

What is the buddy system

James Munns: What I mean by the buddy system is we have some more powerful embedded Linux SoC. This is like Raspberry Pi class, something, you know, ARM Cortex-A that has a fair amount of processing power. These days they're getting faster and they're getting more powerful and get them in fairly tiny sizes. But they're running Linux. They are more like a computer than you would think. And on the other side, we've got our bare metal microcontroller. This is more Raspberry Pi Pico or RP2040 class. You know, we're talking ARM Cortex-M or other 32 bit microcontrollers these days and having them linked up by some interface. I have USB in this one, but it could be whatever you happen to have available, but have them working together as buddies.

There's a real push to try and pick one. You go: well, is this a microcontroller sized project, or is this an embedded Linux SoC sized problem? You go: well, obviously I want it to be cheap. I don't wanna have a bunch of things going on. I wanna try and do it all in one chip, and I think that is a foot gun. Maybe not on day one, but certainly as the project grows, it can very quickly become a foot gun.

Amos Wenger: James, I think this is going to be a very short episode because I wholeheartedly agree with you. I mean, I've never done this professionally, unlike you, but it does seem to make sense to me. Like, don't try to do everything on a system that is only suited for half of it. Just have both. It's actually affordable nowadays.

You talk about Raspberry Pi, the computer like ones? The, like, Raspberry Pi 4, 5. Aren't they always out of stock though? Isn't that an issue?

James Munns: Well, if you pick the newest model, yeah, but this is sort of my placeholder for, it's not just Raspberry Pi that makes SoCs. And in fact, they're usually the more high end, low volume kind of thing. There's a lot of very cheap SoCs, especially now that RISC-V is a thing. You can get like, surprisingly powerful quad core 64 bit RISC-V processors for basically nothing at this point.

Amos Wenger: So RISC-V is actually getting adoption right now...

James Munns: They're selling a lot of chips, so I think, especially coming out of China, you see a ton of Allwinner parts that are getting used for set-top boxes. Depending on when you will listen to this, it will either be different or not. But especially for things that are not super cutting edge, like high end phones, things like set-top boxes or signage or appliance type stuff where, yeah, it's like 40 percent slower than the equivalent ARM device or something like that.

But if it's a set-top box and it has the DSP that you need, and you know, you've hit the minimum threshold, you might be able to hit that minimum threshold for way cheaper, but it could be an ARM Cortex-A-

Amos Wenger: So that is why so many lower end TVs have UI that freaking lags so much.

James Munns: Minimum viable product for the price point.

Amos Wenger: They have hardware H.264 decoding. So that works, but everything else is so slow. It takes seconds to get to where you want to watch. It's incredible.

James Munns: You can't just blame that on RISC-V because that's been the case for 20 years.

Amos Wenger: That's true...

James Munns: That's just one of those, like we're hitting a price point with the compute that we can afford to put in all of these set-top boxes and stuff like that.

Amos Wenger: I mean, meanwhile, people are complaining about Apple TVs and saying you should just use the newer Mac Mini M4 instead. Because reasons? I don't know. I follow too many weird people on BlueSky

James Munns: yeah.

Embedded Linux for the BIG stuff

James Munns: But the thing is, embedded Linux is great for this big stuff. If we're talking about video encoding or decoding, if we're talking about crunching numbers, Linux is just easier for doing these things, especially when you want to have an operating system.

You want networking, you want file systems, you want databases, you want to be able to do over the air updates without having to really dangerously re flash the whole device. You just want, like apt-get update or whatever BitBake tool you're using.

And the other thing that's not even a technical thing is hiring. You would probably be comfortable, Amos, sitting down on an embedded Linux device because it's still Linux and it feels very comfortable. So even if you don't necessarily fully consider yourself an embedded developer, it's still a pool you're much more comfortable in, especially if someone else has done the setup and you're like, "Okay, now I can SSH in," and it's not markedly different. It might even be faster than a super cheap VPS instance, which is also, you know, itty bitty in terms of performance and capabilities.

Amos Wenger: Yeah, I was thinking if you have some sort of web UI, which a lot of connected gadgets do: it would definitely be served from like the big, the big buddy.

James Munns: So these kinds of things are all things that are super off the shelf and easy, both in terms of like actually doing it and what exists out there for doing it. So these are all things that are very easy to do on Linux, even if it's a relatively small Linux system.

Amos Wenger: Actually, isn't that what Tesla does? Like they run Ubuntu in their cars?

James Munns: A lot of gauge clusters these days are running embedded Linux and things like that. They have some ways of like hyper visoring away the safety critical stuff and the non safety critical stuff and you have- exactly- that kind of stuff of 'not on the critical path' things because it just makes iteration way easier. And if you can segment your product in a way where a little bit of lag on the user interface isn't going to cause problems with your brakes, then it's way easier to not do everything in the super safety critical world.

So exactly this kind of thing.

Amos Wenger: And this is why I like that you're my co host James! Because- because people get scared, Oh, they hear about Kubernetes and the cars or whatever, and they go like, "No, I don't want my brakes to stop working!" Because- whatever. Because they have their own experience of like, someone tried to roll out Kubernetes in their internet startup, and it took eight months and everyone was lost.

But actually, yeah, it is just for some stuff. And the rest of the stuff can still keep working. It's very important, like isolating- reducing the blast radius is also a thing in web services.

Bare metal for the little stuff

James Munns: Then you can do bare metal. And when I say bare metal, I mean, like this is stuff that runs on microcontrollers. It could be on a real-time operating system. But they're still fairly limited, it's not the full capabilities of something like Linux. Or it could be a totally bare metal program, or just an async executor, which is very common in like embassy and things like that today.

But we're using bare metal for the little stuff, because it's good at a lot of things too. If you're building up custom hardware, a microcontroller is usually pretty well self contained, they're cheap to put on a circuit board, you don't have to put DRAM and an SSD and all of cooling and all of these kind of things- usually you can drop a single chip, or a chip and its storage on a circuit board, which means that board is very easy to design and it also means it's very cheap to make.

So if you go, "I just want something that's a little custom," you don't have to build a motherboard, basically. You can just build like a little accessory card fairly quickly. It lowers the threshold of how skilled of an electrical designer you either need to hire or pay for.

Amos Wenger: Right, just for the rest of us: so DRAM would be actually chips that you would solder onto the board?

James Munns: It can be. the same way, like Apple solders, their memory onto the board, you'll have like DDR2 or DDR3 is fairly common for embedded devices. They're behind what your top tier Android phone would be. But they're still more complicated to have external memory. There are some chips that have built in like, on the same chip as the CPU they have their memory, but usually DRAM takes a lot of space.

So that's usually why it's outside of the main CPU even though you want it to be as close as possible but just physically the transistors and everything that you need, you need space for that.

Amos Wenger: Right. And the D stands for dynamic because it's refreshed...

James Munns: Exactly. Yeah. So SRAM is usually what you have in microcontrollers and it's also fairly large. But it's comparatively like electrically and space-wise, expensive compared to DRAM, but main CPUs use SRAM as cache because it's the fastest thing and you put it right next to your CPU. But usually when you want a big bulk DRAM, which is your DDR memory- like you have on your desktop or your laptop or whatever- that's a separate chip. On desktops it might be on a card that you plug in just for convenience, but on your laptop or your phone it's going to be soldered on right next to your chip.

Amos Wenger: I remember upgrading old laptops, they still had the, whatever it's called... you slot them in.

James Munns: I think ThinkPads might still have it like the bigger ones, but not the like ultrabook style ones.

Amos Wenger: I don't know, because ThinkPads is no longer IBM, so I just assume they're crap now. I don't know.

James Munns: So the other stuff that microcontrollers are great for is real-time. When you don't have an operating system or you have a really purpose built operating system, you can make guarantees. Like: if this signal comes in, I will respond to it and finish responding to it within 10 microseconds. Where that might be something that you can start to get real-time or you can overcome the problem with power because if your CPU is running fast enough, you know, probably you'll get to it quickly, but it can be challenging to really like objectively guarantee. And-

Amos Wenger: It's this kind of thinking that leads to audio glitches in consumer hardware all the time because they're designed with like it's probably gonna be fine and then sometimes like glitching sound.

James Munns: So audio is usually called soft real-time because no one's day's really messed up if it goes wrong.

Amos Wenger: Except mine.

James Munns: Hard real-time is usually, like, you stick your arm into a machine and the sensor detects that and kills the motor within 10 microseconds Like the SawStop, for example, where there's that table saw. It's basically a cap touch sensor, like your phone.

And if you touch it within a certain amount of time, it fires an explosive, which drops the blade down. Like you don't want that to lag by like 30 milliseconds, because that might be the difference between having your finger and not. So that one's pretty important.

Amos Wenger: True. I wanted to mention that the real-time patch set for the Linux kernel has been merged recently. They had a whole celebration of that and I didn't know. So to give you an idea, right now I'm using all Apple hardware because I unified everything and it makes my life so much simpler and I'm able to get so much more work done.

But back in the days, I was an early Linux adopter. I had like Mandrake, I used Red Hat before it was renamed to Fedora, I installed Slackware from floppies, I don't know, I'm older than I look, people assume I'm a zoomer when they look at my YouTube videos, but I'm 34, and I grew up with 10 years of lag on technology.

Why was I saying all that? Because I remember building my own version of the Ubuntu kernel with the real-time patch set to be able to run a digital audio workstation and do audio work on Linux because that was the only way to do it. It's crazy to me that it took, I don't know, two decades? I don't know exactly how long to get this merged into the main kernel. I have no idea. This is probably a very good reason. So my question to you James is, is it still as relevant now that real-time Linux is in the main line kernel?

James Munns: Yes, because there's still limitations on how granular you want your real-time guarantees to be, because that's one of those things where real-time doesn't mean faster, it means deterministic. So the scheduler might be way less efficient, but you have the chance to swap tasks often enough.

But there's still some limit where a microcontroller is likely to be able to respond faster. Sometimes that's down to the CPU architecture. For example, on Cortex-M there's a bounded guaranteed amount of time when an interrupt occurs, how many cycles it takes to when you're starting to do that versus on Linux, it might be... because Cortex-A processors as an architecture don't have the same... it might not even be at the operating system. It might be down to the, like, architectural level of the CPU, because these CPUs are designed to be real-time and if we're talking about like a millisecond, both of them will be able to do that fine.

A hundred microseconds, probably fine? But when we're talking like one microsecond or 10 microseconds, you're really counting CPU cycles and you want the actual electrical hardware to guarantee how long it's going to take before you start executing those cycles that turn the motor off.

Amos Wenger: Yeah, love the way you phrased it, like 'real-time doesn't mean faster' because I'm thinking about tokio as well. A bunch of people, you know, benchmark reading a large file from synchronous Rust and then they do it with tokio and like, "tokio is slower." Well, first of all, yes, for file system IO, yes, because it's actually using blocking sys calls in a thread pool.

But even if it wasn't, it would still probably be slower because there's overhead involved in making sure you can actually multiplex that on a single thread and that costs CPU cycles, but then you can service multiple requests at once. It's better to have like all the customers get their response in some time then be blocked on only handling one customer, one response.

James Munns: Yeah. I know I've talked about this with Eliza, but in the same way that you care about tail latency on a server where you say: well, yeah, okay, 99 percent are handled. But what about the next nine or the next nine, if that becomes ridiculous? In hard real-time, you want to say it will never literally never, no matter what's going on- I don't care if I'm in the middle of a database transaction- it will never take more than two and a half microseconds from 'we noticed a touch' to 'we're firing the explosive' and the explosive takes 20 microseconds to fire. Which means we know that from when your finger touches that, it will never be more than like whatever I said, 12 and a half microseconds before that blade is dropping and no longer touching your hand.

It's one of those things where engineering is all about 'good enough for what I'm doing.' But sometimes you really care about that. And honestly- and this is what I'll get into later- sometimes it could be possible with Linux, but it's not necessarily easy. We're on a microcontroller. It's like, "Okay, yeah, I analyzed it. It's fine." Versus on Linux. You're like, "I'm... I'm pretty sure it's probably fine-" you know...

And all of this stuff that I'm recommending is not saying that you couldn't do it another way. But if you use the tools for what they're good for, you're going to have a better time.

Amos Wenger: So that's why you're saying it's not for very large productions. You don't say productions... runs?

James Munns: Yeah. We'll get into that a little bit later. But then there's other things like the amount of IO you have. And this can be just like, "Hey, I need to talk to 600 switches cause I'm a big panel," Or I need to talk to 27 different weird serial ports and things like that.

These embedded Linux devices, especially like the Raspberry Pi has popularized it. You see people plugging them straight into sensors and buttons and stuff like that. And they do work. They just usually have a fairly limited palette of things that they can do, or a limited number of pin counts where it's challenging to get, you know, super high pin counts on one chip.

And so you end up having to have like expanders or adapters and things like that. And it ends up complicating your design. And you end up having to like, "Oh, well, I can only scan these things so fast because I have to talk to 10 of them," and you can get into real trouble when it comes to doing what you'd like to do.

Amos Wenger: Well, and then if you use an adapter, aren't you using the buddy system anyway, because you have big Raspberry Pi and then something over USB that has all the IO pins that you need?

James Munns: There's something to be said about paying for less software, if that makes sense. In that hardware it acts the way it acts and you don't have to think about software. So there is something to be said about that. By the time you've added a couple adapters and things like that, it ends up costing more than a microcontroller in a lot of cases.

And the other thing is low power. Your phone is relatively low power, but you are charging it every day. A lot of these, especially the low cost embedded Linux SoCs, don't have nearly as good power management as a nice phone would. And if you're trying to deploy something that's going to last for a year with no solar or no landline or anything like that.

If you have like a Raspberry Pi, unless you have a car battery it's going to drain your battery in a day or a week or maybe a month, but you want something that's going to last years. What you really need to be able to do is not just sleep, but shut down the power hungry... it's one of those things where it's not even the CPU like...One of the differences between SRAM- static RAM- and DRAM, which is dynamic RAM is you have to refresh dynamic RAM constantly which takes electricity to basically recharge the capacitors in the memory so that your memory doesn't get corrupted. And that takes energy, and so if you can't shut off your memory, if you can't shut off your CPU, you just have this sort of baseline power usage that most, not all of them, most microcontrollers will be able to go down to like micro or nano amps.

Whereas a lot of these embedded Linux SoCs are way, like, orders of magnitude higher in milliamps, even when they're in sort of like low power mode.

Because I am who I am... use Rust for both

James Munns: And I'm going to tell you to use Rust for both because I am who I am, but it allows you to do a lot of very cool things because we don't have to pretend that our microcontrollers are archaic little systems anymore.

They are 32 bit processors, and that's not the same as the 64 bit processor you're using, but it's much closer than the little 8 bit microcontrollers with weird architectures,

which means you can share code, tools, workflows, and even developers. It gets to the point where if I sat you down and showed you an embassy project that's still using Rust async, there might be some learning curve of like, "Oh, I need to learn some different habits. Cause not everything's great here." But the big picture would make sense. You'd go, "Yeah, I'm going to put a timeout on it. I'm going to await." It's all going to make sense fairly quickly.

And that's super valuable because a lot of embedded teams that I work with, there's usually like five or 10 backend and front end people. There's like one embedded developer and one hardware person, and that's like the whole team, which means if you ever need peer review or getting people to work together, if you're not sharing tools and language, it becomes very challenging to ever get good code review or just a good second set of eyes on what you're building.

Amos Wenger: Yeah, and I mean, this is context for the entire first season of SDR, because it's like: why do you care about serialization/deserialization, why do you care about it running on embedded?

James Munns: That's true!

Amos Wenger: The nice thing of like, if it runs on embedded, it will definitely for sure work on desktop or like a Raspberry Pi 4 or 5 Cortex-A, I guess.

James Munns: Yeah, exactly. And that's what I'm going to recommend is you tie it together with a library like Postcard RPC or a tool that I'm working on called Poststation, which is sort of like a reverse proxy for embedded devices. But the thing is it's a way of communicating where we treat them like a real set of peers that communicate with each other. In the same way that we treat a browser and a server as peers, like the client and the server, they are equivalent to each other. We're not just treating them like totally different machines. They are both computers.

Using something like Postcard RPC also lets you use whatever communication link you have between these devices. Like the Raspberry Pi, it's got a USB port, it's got UART or a serial port, it's got SPI or I2C, all these different kinds of serial ports that are better or worse at certain things, but you can use whatever links you have, because Postcard RPC is very simple, and Poststation is very flexible in what it will proxy to, instead of just over the network, you can throw it over essentially any interface that you can make frames on, which is every serial interface, every one that I have here. So it becomes very easy to proxy over whatever link you have, even if it doesn't look like ethernet.

The best of both worlds

James Munns: And really my goal here is to have the best of both worlds. You don't have to sort of pick and choose and be awkward on some things and just lean into it, design your system in a way where you go: look, all the real-time stuff- the microcontroller is going to be in charge of that.

And I'm going to make it sort of the authoritative one on anything safety or real-time or signal processing or IO expansion and things like that. And for all the big number crunching and networking and deciding user interface things, I'm going to put that on an embedded Linux and it's going to be in charge of that.

And when you think of your system, you're going to go, who's in charge of what? In a network cluster, you're like: that's my database server and that's my front end server and that's my caching server. They're specialized to different tasks, but I still treat them as peers that are working together on like the bigger common goal.

Amos Wenger: I keep thinking of how you said you can use Rust on the big one and on the little one, and the equivalent in my head I have is using JavaScript in the browser and on server. And it's much more awkward to do that.

James Munns: What's the word for that? Isomorphic.

Amos Wenger: Yeah, I think so. But with Rust, it's so interesting because I've been mostly in touch with people who write Rust not on embedded and they look at async Rust and they're like, "Ah, it's, it's too complicated," and they don't understand the design constraints. But then if you actually are working on both sides, you see why it was designed that way, because then you just get to write async Rust code on those tiny, well, tiny ish devices.

James Munns: Yeah, for sure.

Amos Wenger: And it's the same language. It's something I really appreciate with Rust, is that you have so much range. You can build abstractions, you're not stuck to the ground, like with C, where it's like, Nowhere safe. Truck is all you get and that's it. You don't get anything else. Or C++ where I'm not sure what's happening with rustc. You get like this whole range and you can go down into the library. And, yeah, there's some unsafe code. Okay. Unsafe is- it's- it needs to be there, but it's all the same language. You can dig, you can understand there's no VM boundary, like high level code versus low level code. It's all the same language.

James Munns: Yeah. And I think the isomorphic JavaScript speaks to that of like, there's a tangible benefit when you have multiple parts in your system, having common tooling and common language and things like that, because there's developer benefits to that. Because you're able to jump between the two of them. This is something that I recommend to people where they're like, "Rust is really good at FFI!" and I go, "Yes, it is,_ if you have to_," but if you can, sometimes like, there's a real benefit to only having one language because anytime you jump that gap, you need to be an expert in the semantics of both of them. That's possible, but it's way easier to get someone to be really understanding of Rust and where things start and stop there, or even just someone to be familiar and C where things start and stop there, whereas whenever you transfer over that boundary, you have to think both.

And that's true over the network as well, because if you have some Go server that does verification of data in one way, but JavaScript or Node does it in a different way, you get something that mostly works until that one day when they don't agree. And then you need someone who knows enough about both of them to really figure out where that just like tiny little slip in the seam is.

Amos Wenger: Yeah. Again, that's also a place where Rust shines, is that you- specifically you in Postcard RPC, are leveraging the type system a lot, so there's just a lot of mistakes you cannot make. When I try to teach Rust, I don't really try to advocate for it, I just try to teach it, because I learned it, I thought it was neat, and so I've been teaching it, and people are advocating for it. What you're taking away from them is the pride of being in very dangerous waters and not dying, like having all those mistakes you can make, but being smart and clever enough that you don't actually make them.

And then switching to something like Rust is like: okay, occasionally you're going to have the compiler be mad at you and it's going to take some time to understand why. But most of the time, there's so many mistakes you can never make and it's so relaxing. You just kind of: if it compiles it probably works, if you design your types correctly which is a big if. But you know you get a sense for it after a while.

That in the embedded world, where you don't have all the niceties of the operating system which is going to have your back and like just it's no big deal to blow up the stacks no big deal to do a lot of things, you don't even have to free your memory. If you exit the process you're just going to reclaim everything. Virtual memory is really nice. I don't know if you get that on MCUs.

James Munns: No, nope. We have no MMU, so there's no virtual memory.

Amos Wenger: Exactly.

James Munns: But my threshold for this is usually around one to 10,000 units. This number seems higher than most people, where I'm like: below that, what you do in terms of the hardware you design surprisingly doesn't matter. That's probably more like one to ten thousand units per year. If you're below that threshold... you mentioned Kubernetes- embedded sort of has the same problem as some people complain about Kubernetes- you're designing for a scale that you don't have. It's not that it's a bad solution, but it's a solution to a problem you don't have yet. And so when I see people like really optimizing the cost of per unit If you're not making 10,000 units, yeah, don't do something super egregious, but I'll get to this in a little bit. Like... the buddy system is probably going to be cheaper in the big picture for what you're doing.

Amos Wenger: I see what you're doing with the Kubernetes comparison and I agree with your point, but I think Kubernetes makes sense for a lot more people just because it's standardized.

James Munns: I think we'll have to save that chat for another day.

Amos Wenger: It breaks down at this point! Because you brought up Kubernetes: there is a nice article that I read recently called, "Dear friend, I'm afraid you've just written a Kubernetes," or something like that? We'll have it in the show notes, so that we don't have to discuss it right now.

James Munns: Sure.

The buddy system is probably cheaper, developer time is expensive

James Munns: And so my pitch is if you're under this sort of like one to 10,000 unit threshold, the buddy system is probably cheaper. Even though I'm telling you to use two chips, you know, a slightly more complex design, because you've got two systems that you're writing software for and things like that. It will likely end up being cheaper because

development time is expensive, and being able to do two easy things is often much faster as a developer for trying to pack both of those easy things into one thing, which makes it end up being a very complex problem. It's not that you can't, it's that: you might have to write kernel patches, you might have to figure out a lot of external hardware that you need as adapters, you might have to really pack things in and optimize because you're like, "I really have to hit that timing thing."

If you just use two systems, then you can let them both play on their strengths and you can even pick versions of each of those two buddies that are really specialized for that. And usually, things that are specialized for one feature is much cheaper than if you need one chip that is specialized on three different things because you need to do that all in one chip. So either the per unit cost or just the amount of time you spend on development because developers are very expensive per hour.

Amos Wenger: I agree.

James Munns: And yeah, like I said, doing microcontroller things on Linux sucks. You can do it. There's the real-time patches. Down to some threshold it will be fine. If you're building something that needs one serial port, ignore me, you'll be fine.

As soon as that starts chafing, as soon as it starts being uncomfortable, that should be your sign of like, instead of trying harder and committing harder and harder to optimize things. Just give up, use the buddy system because you know, for one or two things, it's fine. When you get to five, it's uncomfortable. When you get to 10, it's egregious. It's one of those things where you don't have to necessarily do it on day one, unless you think you might need it. But just trying to make the big system do little system things is painful.

Amos Wenger: I'm imagining the Floppotron?

James Munns: Okay.

Amos Wenger: The thing that plays music with like a hundred floppy drives or whatever, and it's taking MIDI files? That's something you would need an MCU for, right? To drive all of those.

James Munns: Or like a lot of expanders. You need stepper motor controller drivers times a hundred.

Amos Wenger: The timing has to be very precise. Yeah. Yeah.

James Munns: Yeah. You won't find any embedded Linux system that can run a hundred of those. So it needs to be a special driver IC, or you need some way of managing that.

And the same way doing Linux things on a microcontroller sucks. Like there are libraries that are nice these days where you can have a TCP/IP stack on an embedded device. You can have HTTPS on an embedded device and it's async and it's nice, but it still means you need a really big microcontroller.

It still means that you're running a network stack. What happens when you want to update your certificate chain? On Linux it's just an apt get update. And on the microcontroller, you go: we need to be really careful because we want to do our update over SSL. If we ever send a corrupted set of SSL certificates or TLS certificates- oops, now our firmware updates don't work anymore!

Amos Wenger: I was thinking Linux people get uppity when you do static linking, like bring your own libraries. Cause they're like, "No, I want you to use my distribution's version of LibSSL!" So that they can patch vulnerabilities. But now imagine your entire TCP stack is baked into-

James Munns: Baked into the firmware. Yep, exactly. Yeah. A firmware is just one application. Like it is an application that does operating system jobs is all a firmware is.

Off the shelf & simple boards are cheap!

James Munns: And the thing is, off the shelf boards are cheap, like I said. It's way easier to buy a high end off the shelf board than to design a custom optimized low cost board, because just economies of scale are what they are, and it's cheaper, in a lot of cases, to get a Raspberry Pi than a correctly sized thing. And this is one of those things that people on the internet lose their mind over. They're like, "Ah! Why are you using a whole Raspberry Pi to do that?!" And it's like, because my time is expensive and that costs 40 bucks. And by the time that I have to buy three cheap things and wire them together, even just in parts, I've probably paid more. But then when I factored my time, we're way, way over these things.

Do your hobbies as you like, if you like retrocomputing, yes. But you should not approach a company project as this, unless you have to.

That's the other thing: when you do want to design something of your own, the simpler it can be, the cheaper it is. If you can have someone throw together a two layer circuit board, or use off the shelf parts that have fairly big pins, and you can throw them in on a board, I can send that design over to JLC, and in two weeks I'll have the board for, like, five bucks a board.

Or I can get it made locally for a bit more, but I'll have it in, like, two days, because I don't need an eight layer graphics card design sort of thing going on. So even when you do design something custom, cause you want it to be smaller or fit whatever you'd like, that's cheaper too. As long as you can have two simple things instead of one really complex thing. Cause if you need the 800 pin count CPU to get all of your IO, that's going to be an expensive board. And every time you get it wrong...

Circuit board design is like when you hit compile, it takes two weeks to figure out if the compile worked and also like a couple thousand dollars every time you compile because you have to get these board made, you have to bring them in, someone either has to do the assembly or you have to do the assembly and then you might find you shorted two pins together and the whole power supply fries and you go, "Well, I guess we're going to wait two more weeks." Versus a simple design, you go: yeah, we'll just slap it on. It's probably going to work more likely in the first thing. And even if it does not work, it's faster to cycle and cheaper to cycle.

Amos Wenger: If you go even deeper in designing your own hardware and you're writing VHDL or whatever the other one is, you can simulate those kind of, but synthesizing takes forever. The whole tooling around that is horrible. I've done a little bit of it at university, so it dates back. I hope things are improved, but I'm not holding my breath.

James Munns: FPGAs are a whole conversation for another day.

And that's the other thing about Rust is you get a lot of stuff off the shelf. Like I said, I talked about postcard, but we also get libraries for stuff. And this is both on the embedded Linux side and the microcontroller side. Sometimes the same library, like in Postcard, but if not, when you're doing your microcontroller stuff, we have embassy. You don't have to build your own OS or scheduler or whatever. You know, there's drivers for all of these things. They're fairly portable. We have the whole trait system, which allows you to do portable drivers and stuff, even to the point where you could port it from the Raspberry Pi to the Raspberry Pi Pico and use the exact same display driver on both of them.

So like being able to use pieces off the shelf, either public ones or ones that you've built at your company where you go: well, sometimes we just have the Raspberry Pi, but sometimes we have the Raspberry Pi with a buddy system. You just write the driver once and use it in house and reuse things.

Amos Wenger: Yeah, I guess a buddy system with a language like C, an ecosystem like C-

James Munns: It's hard.

Amos Wenger: -without a package manager or a standardized thing, yeah, it would be a lot more work.

James Munns: Yeah, it's possible. And people have done it. It's just, it's challenging. Like the language doesn't fight you, but it doesn't help you. It can be done.

Amos Wenger: Yeah, you have to enforce convention inside of the project...

James Munns: And be really careful, yeah.

Amos Wenger: So Postcard RPC is an off the shelf comm stack. You just plug stuff together, whether it's over USB or serial or whatever. And you just have client and server libraries for both of them.

James Munns: And Poststation, the tool that I'm building now also has a bunch of tools and APIs and SDKs for it. So when you plug your device in, you get a GUI for it, or you get history data and things like that. Whether you're plugging it into your laptop or the actual embedded SoC, you can just mess around, because there's different stages to designing hardware. First, you design the circuit boards and usually you're sitting there making sure all the motors spin and you can read raw sensor data where you're not really doing business logic stuff yet.

You're just poking it slowly to make sure it does all work. And then you sort of iterate from there where you go: okay, now I'm going to make it automatically do this at 100 hertz or whatever, and then aggregate or filter the data. And then I get to sort of the shape that I want. Being able to not have to build custom stuff, not having to build jigs yourself where you can just go: okay, I'm just gonna, can I poke, poke, poke? Cool. Done. Move on. And then I can hand that tool to the factory people to make sure that whenever we assemble one, that it wasn't miswired or not calibrated or whatever you need.

Amos Wenger: We've seen you do that a bit on your YouTube live streaming channel? You've been -

James Munns: Yeah.

Amos Wenger: -iterating on the real life version...

James Munns: Of our logo? Yeah.

Amos Wenger: of the SDR logo.

James Munns: Yeah. That was a fun one because a lot of that is like the first stage of any project is just wiring it up and then poking it to make sure that you wired it up right before you start spending a lot of time writing a lot of software that adds more confusion. to the pile.

And the other thing it lets you do is scale horizontally This is another one of those server terms: scaling vertically means you just buy a bigger server. You get the 64 with 96 gigs of ram instead of a bunch of little machines, because for some things that's just easier because then you're not worrying about a bunch of machines, but

Amos Wenger: James, they make so much bigger servers.

James Munns: Yeah, they do. Yeah, they do.

Amos Wenger: You can have terabytes of RAM.

James Munns: Hell yeah. But scaling horizontally lets you say: look... if I'm doing something really weird and I need a hundred input pins, instead of trying to find a chip that has a hundred input pins, you go: I'm just going to buy 10 chips that have a hundred input pins. So then I get my thousand input pins and that's 10 USB connections. Linux is like, whatever, that's a USB hub. It's not a big deal.

So we can go from our embedded Linux SoC with one microcontroller.

We stick a little USB hub in front of it. And all of a sudden we now have triple the capacity for the thing that we're doing. We can have triple the motors, triple the sensors, whatever. And this was cheap to do. You can buy a single chip USB hub from WCH for like 60 cents and it's four port USB high speed. And all of a sudden you can talk to four devices now for an extra buck.

Amos Wenger: What is the maximum number of devices you can hook onto a USB hub?

James Munns: There's, uh- I don't remember either limit off the top of my head. The answer is you'll run into some limit in your operating system and you'll run into some limit in the version of USB that you are using. So like full speed and high speed, but it's in the hundreds. Up to like a hundred or so everything will probably work if you have enough power and things like that, that you might have some degradation of how fast you can talk to all of them. You'll hit some other thresholds first, but like tens: no problem. Fifties: probably fine to clear. Hundreds... well, you'll probably start having problems.

Amos Wenger: James, I need to pitch you a video idea to your company. It's how many USB devices can we hook up to a single computer before things start going very wrong?

James Munns: What was it? LTT did that. Yeah, they did it with like keyboards or, flash drives or something. We can put that in the show notes, but they did exactly that. They even talk about like, there's the theoretical USB protocol limit. And then there's when Windows just falls over. They all have power. It should be fine. But just at some point, the operating system's like, "I just can't anymore." And it drops like 60 of them immediately. Like things just timeout probably.

Amos Wenger: Then the next obvious step is to write a driver so that you can have virtual USB devices that are actually proxied over ethernet to a mega hub.

James Munns: Yeah.

Amos Wenger: And I'm not sure Linus did that one, James. Did he?

James Munns: By the time that you just have that- probably Ethernet's the answer.

Amos Wenger: Yeah.

James Munns: But that's again, another conversation...

Amos Wenger: Mac Minis are going the other way, you're replacing Ethernet with USB4, then you get,

James Munns: or Thunderbolt specifically. yeah.

Amos Wenger: Isn't USB4 and Thunderbolt basically the same thing now?

James Munns: Uh, kinda. They're on the same connector. They have similar capabilities. Thunderbolt is slightly different. I don't know off the top of my head.

Amos Wenger: That's very confusing.

More buddies, more better

James Munns: My opinion is more buddies, more better, because if you can make a firmware that does one thing and it's a hundred lines of code and it is obviously correct, you will never need to touch that. If you write a firmware that does 10 things and it's 10,000 lines of code and you have to worry about interactions between the different ones, that's not 10 times harder, it's a hundred times harder versus having 10 really simple things where you go: yeah, I can see it all on one page. It's obviously just going to receive a message, set the motor value. I don't have to worry about synchronization. It's all fine.

Amos Wenger: You don't want to talk about Kubernetes, cause you're trying to keep it tight, but you're just selling us a microservice architecture.

James Munns: Yeah. Oh, I absolutely am. I absolutely am. It's the same kind of vibe. It's just learning lessons from other people. Sometimes, not all the time, but sometimes it's easier to have one thought in your brain at a time, if you can design it in a way where it's clear how you separate those and how they interact with each other and you don't run into any other limits.

Amos Wenger: Much like in real life, it is all about choosing boundaries.

James Munns: Yeah, it's engineering!

And the other thing is you can iterate faster. When you have these big complex circuit board designs, iteration takes a while. Like I said, designing the board, updating the board, making any fixes... different people have to synchronize on it because you know, different people have opinions. Versus if you have like relatively siloed functionality, it's like, "Okay, we switched to a higher power motor driver," or "Hey, we switched to three sensors instead of one sensor," or something like that.

Because, in the beginning, when you're bringing up, like I was doing on my stream, you don't even need embedded Linux to be working. You don't need to worry about the image for that.

You don't worry, you have flashing that, updating it, provisioning it, whatever your laptop is the SoC. And at least with Rust, you can run Rust on Windows, Mac, and Linux. If it's over USB, who cares? And if you don't have USB, you buy a little USB to serial adapter or whatever.

And all of a sudden you're testing not that far from things. The first prototypes might be thousands of dollars because you're making them in small batches. If you have five devs on your team, you go, "I don't really want to make 10 boards..." and then someone fried one of them because they plugged it in backwards.

It's way easier to just have this collection of cheap things and everyone gets a version that they can plug in. You can hack on that quickly because you can go from your embedded Linux SoC to a laptop: whatever! No big deal.

Amos Wenger: I was thinking about the other way around. You can also mock the microcontroller. Like, it's just a bunch of messages. Yeah, yeah.

James Munns: We're going to get there. We're going to get there. You can replace either half as the design evolves. You don't necessarily have to worry about breaking your motor control circuit because you updated your embedded Linux SoC. You can sort of tick tock between the two pieces. And like you were saying, you can test either half in isolation.

If we have our normal system, which is our SoC talking to the microcontroller, if we want to test the microcontroller system,

we can have a hardware test rack that is just exercising this with a bunch of sensors on it. And there's no embedded Linux. It's just CI server that's just chugging on that.

And if we want to test just our application. Like you said, we can mock that. Postcard RPC even has the ability to say like, it has a generic interface and I have a version that instead of using the USB port, it uses tokio channels. So you can just have another task that's pretending to be something like that. Poststation actually does that built in where it will simulate devices for you. That just like ping and show data. So you can test that you can poke them.

Buy yourself time with things that work enough

James Munns: This is the same in software. You can buy yourself time with things that work enough. Even if this design is too big to be shipping to your customers, you can make a hundred units for internal usage. You can take them to demos. You can take them to user testing where you're buying yourself time where you don't have to do the waterfally kind of design, where you have to wait until everything's in its final plastics before people can touch it.

You can start getting that feedback early and iterating on that design. And you can buy yourself a lot of development time and unsticking that critical path in the design and worry about optimizing for cost or these kinds of things after shipping V1,

because a lot of the customers I work with don't need V2 or they have the V1 that they shipped, which is based on this buddy system. And they went, "You know, it's fine. It's okay. It's like 10 cents more a unit. A dollar more a unit..." usually the way you translate from hardware to retail is: every dollar it costs you more to make something, you should be charging three more at retail. So if it costs three bucks more, it's going to increase your product probably by like 10 bucks.

But in a lot of these low volume cases or initial prototypes and things like that, just having something that works well today is worth charging an extra 10 bucks for.

And I'm going to use a couple terms here. There's BOM and NRE. So BOM is our bill of material.

Amos Wenger: Byte order mark...

James Munns: Byte order mark. Exactly.

Our bill of material, which is what it costs us to make everything. It's the checklist of every bit that gets assembled into our product. So our bill of materials, it's a whole list and what it costs us per unit to make that.

And the NREs, which are non recurring expenses, which is you and me. That's engineering, design time, that stuff you pay once. You have to include that cost in what you're selling, but it's not a cost you have to keep paying when you're going from a thousand units to a million units. But it's one of those things where if you're only making 10 units and you have to pay 10 developers for a year, each of those units needs to have a whole developer salary in the cost of the unit.

Which is why this industrial stuff or the experimental stuff: you buy one unit where you go, it's like a Windows computer in a couple of things. Why are you charging me a million for it? And it's like, because I only make 10 a year and you need this and I'm the only one that makes this kind of thing.

Yeah, so this is our per unit cost versus our design and dev time, that's where that threshold comes from. When you have more than 10,000, more than a hundred thousand units, that fraction gets divided really significantly. And it only adds pennies, dollars to your bill of material.

When you're making a thousand, if you have one dev for one year and they're making 50, a hundred thousand dollars a year, like not even San Francisco salaries, you divided that by a thousand. That is a non trivial amount of what you have to charge per unit for one developer, and you probably have more than a couple.

Treat your buddy as a partner, not a black box

James Munns: The whole goal here is you should be treating your buddy as a partner

instead of as a black box. Cause that's the history of embedded is you go: well, I make my little embedded system and then we pretend that it's not running firmware. It is just a thing that works and it's a card. Whoever writes the driver has to think about it, but then we never think about it again.

My pitch is to treat it more like a partner in the same way on servers where you go: look, I'm going to consciously acknowledge these are all computers and they can talk to each other. And we can treat them not as equals, but as like reasonable partners in this whole thing.

This episode is sponsored by Poststation, a tool from OneVariable that makes it easy to set up communication between your desktop, laptop, or an embedded linux system, to as a many connected microcontrollers as you need.

If you're a company building a product around multiple devices, and would like to have all of the "plumbing", tooling, and device management handled out of the box, send us an email to contact@onevariable.com for early access.

Check out onevariable.com/poststation for more information.