[PLUG] Netbooting device needs NFSv2

Russell Senior russell at personaltelco.net
Thu Dec 26 01:11:21 UTC 2024


On Wed, Dec 25, 2024 at 10:14 AM Ted Mittelstaedt
<tedm at portlandia-it.com> wrote:
> [...]
> Does this apply to the net4801 or any other more commonly available used Soekris models that use the same CPU?
> [....]

I would expect so, but I don't have anything else in the net48xx
series to confirm with. I used to have some net45xx-series boards, but
those were a different (486-compatible) SoC, and weren't worth the
effort for me and they got recycled. Someone recently reported that
WRAP boards (from PCEngines, not Soekris) didn't seem to be affected
by the bug I had experienced.

> -----Original Message-----
> From: PLUG <plug-bounces at lists.pdxlinux.org> On Behalf Of Russell Senior
> Sent: Tuesday, December 24, 2024 4:59 PM
> To: Portland Linux/Unix Group <plug at lists.pdxlinux.org>
> Subject: Re: [PLUG] Netbooting device needs NFSv2
>
> On Wed, Dec 18, 2024 at 5:48 AM Russell Senior <russell at personaltelco.net> wrote:
> >
> > The right solution is probably just to retire the one in the field and
> > put the whole lot of them into a "museum box", but hey, it's the
> > holidays. What better period to waste a bunch of time keeping creaking
> > hardware alive. And anyway, the museum curators will be more thrilled
> > to create an exhibit if they have working firmware.
>
> I am happy to report that I was able to use the periodic builds I made historically to narrow down the region of the introduction of the breakage to a few months in 2019, between late February and late May of that year. Then I used classic git bisection, in half-a-dozen or so iterations, to narrow the breakage to a single commit. To do the bisection on basically a 5 year old project that is constantly changing, I had to set up a "period correct" build environment. That is because the state of the project back in 2019 did/could not anticipate the changes in the build host environment (things like new compiler and toolchain versions, in particular gcc, g++ and python).
> That meant I had to find a "spare" machine that I could commit to an old OS version. I ended up with Ubuntu 18.04.6, which would have been extant in 2019. I tried a Debian version, but it didn't have the non-free firmware blobs needed to get the laptop ("spare") I had connected to a network.
>
> The single commit was a kernel bump from v4.14.112 to v4.14.113. So, I looked at the commits involved in that transition and spotted one that changed how support for the cyrix chips were supported. So, I took
> v4.14.113 and reverted that single change, and *boom* my breakage was fixed. So, I reported that upstream to the linux kernel people who were involved in that commit. While waiting for a response from them, an OpenWrt guy and I (mostly following his reasonable suggestions and intuition), we narrowed the problem down even further. The root cause appears to be that the SC1100 chip does *NOT* want its SUSP# pin enabled. This pin allows an external device (part of the chipset) to stop and start the CPU. Apparently, during warm boots, that pin gets pulled low and the CPU dutifully stops. So, I have a patch that works for my specific context, although it probably breaks in some other contexts, so upstream will need to determine how to deal with that. My same local fix works in modern OpenWrt with a v6.6.67 kernel. So, my field deployed Soekris net4826 *can* be updated to modern firmware.
>
>   https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/datasheets/goede_gx1_databook-rev5.pdf
>
> In the Geode GX1 family, there is a set of CPU registers that are accessed by first writing a register index to port 0x22 then reading or writing to port 0x23. The "fix" that broke the SC1100 was to actually do that getting/setting correctly in the right order. I
> *think* the reason it was breaking is that the Old Method was trying to set the SUSP# enable bit, but actually failing, so it was not enabled and my warm boot succeeded. When the v4.14.113 changes fixed the getter/setter functions, it did the Wrong Thing successfully. So, the right fix is just to not do the Wrong Thing at all.
>
> Merry Christmas,
>
> --
> Russell Senior
> russell at personaltelco.net
>


More information about the PLUG mailing list