Ubuntu-Ryzen-UEFI Madness (Solved!)

I have probably installed some variant of Linux at least 70 times in my lifetime – probably much more. I had just purchased two sets of components for my home-made Render Server Farm. The motherboards were Biostar and the CPUs were A10-9700G AM4 by AMD. The initial installs went without a hitch using Ubuntu Server 20.04. Then it started getting dicey when I went to install a Desktop (Xubuntu) so I could use Blender. The performance was horrible. The problem was they were using the default drivers that installed and worked with Xorg Xserver. I needed to tap into OpenCL and hopefully get the GPU working with renders. I must have tried for 3 or 4 days installing, deleting, and re-installing various vintages of the amdgpu driver found at the AMD Driver site. I have to say, they treat Linux like a second class citizen as far as drivers go. When you select your processor and version, it only shows Windows operating systems. You have to use ‘search’ to get Linux drivers (I either used amdgpu or the version of Ubuntu in the search field). My issue was even though I bought the CPUs new earlier this year, AMD dropped support for Radeon R7 about 4 or 5 versions from the latest drivers. The latest driver was AMD version 20.10, which supported Ubuntu 18.04 HWE. I suppose I could have dropped back to that Ubuntu version, but I had already installed a GIS server on one of the units and didn’t want to re-install everything. I forged ahead trying all kinds of combinations. I even tried using amdgpu-pin packages from a new driver into an older driver folder to fool the install….didn’t work. No matter what, I ended up with a Mesa driver that did actually work and enabled OpenCL on GPU and installing the AMD-APP driver enabled OpenCL on the CPU as well. Problem was it only showed version 1.2 and not the 2.0 that the processor was capable of. And Blender did not recognize OpenCL on processor….there goes using GPU for rendering. Then I did some benchmarks…… I used a few Linux benchmark tools, like GLMark2, but really got depressed when using Blender Open-Data benchmarks. My main desktop PC with Windows 10 Pro has a Ryzen 5 with 6 cores and the benchmarks put it the top 50% of all benchmarks (just barely). The render farm units? In the top 90%….total crap! My wife’s PC was in top 99%, but it’s just an email/websurf machine.

Time for new CPU

Even though I just bought the other CPUs, they were woefully underpowered for 3D rendering in short periods. I wasn’t up to shelling out for a Threadripper, so I decided to get some Ryzen 5s (this time 5600, not 1600 like my desktop). That is when the nightmare started. Starting with the unit that did not have the GIS server installed, I put in the new Ryzen and fired up the board. Nada….no display. I bet I needed to update my motherboard bios. Sure enough, I went to Biostar driver site and my current firmware supported Ryzen 3 series, and perhaps Ryzen 5 1000 series, but not 5000 series. I noticed the later firmware that did support 5000 series, also dropped support for the A10 series. Interesting, I guess the two types of architecture cannot reside on same bios? So I downloaded new firmware and put on small USB stick. I had to put the A10 processor back and get into bios. I then updated firmware with builtin tool (F12) and shut down board. After re-installing the Ryzen 5, it took a long time for anything to show on monitor. I thought I had a bogus CPU. The bios screen finally popped up and all seemed fine (the display never took that long again to come up). What didn’t come up was the Ubuntu 20.04 that worked (albeit slowly) with the other A10 CPU. I had read about all kinds of issues with AMD graphics drivers and Linux, and my guess is the same reason I needed new bios firmware is the reason why the graphics aren’t getting through to the display. Keep in mind, I had not altered any settings in the bios. The firmware update somehow remembered my settings. So I figured maybe I’ll try Ubuntu 21.10 and see if the newer kernel 5.13 (vs. the 5.11 of 20.04) would do the trick. So I put on a thumb drive and began the install. I could see the progress, but screen just went off. I had to take a video with my phone to see the progress because it was too fast to read. Display cut out after not finding /dev/loop2 can’t open blockdev and last line was ‘Watchdog hardware is disabled’. Technically, those messages had nothing to do with no display, but that is all I had to go on. What is interesting is that adding ‘nomodeset’ to the kernel arguments via Grub, I got the ‘VGACON disables amdgpu’ error message and still got no display a few lines after that. So, 21.10 wasn’t that much better at working with this AMD CPU. Back to 20.04 as it is an LTS.

If at first you don’t succeed, try, try again……

By this time, 2 days had gone by and i was still no further along than seeing just a bios screen. I had to start throwing more things at the kernel arguments to get it as bare-bones as possible. First of all, I wanted to see as much as possible during install, so I removed ‘quiet’ and ‘splash’ from the arguments, and added ‘ignore_loglev’. This time during install I saw hundreds of lines whizzing by, but it still bailed and killed the video to the monitor. I took another video from phone and I saw the last line was ‘[OK] reached target System Time set‘ and then nothing but the ‘No Signal’ message from monitor. I had to use more arguments. I had read various stories online of issues with Ryzen in particular with iommu and B450 motherboards. So I added ‘iommu=soft’ to kernel arguments. Seeing that Ubuntu seems to be shutting off the display, the only argument I could find that affected power was the ACPI parameter. I figured, I’d give that a shot, so I added ‘acpi=off’ in the arguments. I hit F10 and the board continued to boot. This time it booted all the way the Subiquity screen. There is hope yet!

UEFI…Should be WTF!

Then I proceeded with the install which goes fairly fast for a server setup. Got to the point where I remove the USB thumb drive, and hit enter. It booted straight to the bios screen. What? I then looked at the drive order options and there were no drives! It clearly installed onto something. The bios recognized the M.2 256GB SSD, but just didn’t show it as a bootable drive at all. What’s interesting is an unformatted M.2 SSD shows as a bootable option (it did with the USB thumb inserted). Something happened with Grub and somehow imprinted to the SSD that it was a legacy bios drive. I reinserted the USB thumb and rebooted by resetting board. I pulled up the drive boot order (F9 on Biostar) and did not see the SSD, but I did see 3 options for the thumb: UEFI: Sandisk USB, UEFI: Sandisk-Partition 1 USB, and Generic: Sandisk USB. From this I assume that it booted from legacy mode generic USB and Grub never set it as a UEFI boot. In my bios settings the UEFI settings are in the CS menu, and both ‘Legacy and UEFI’ were enabled. The sub-settings were labeled as ‘legacy’. Changing these to UEFI and rebooting did not change how it did not see the M.2 SSD. So next, I tried install using the UEFI version of the USB thumb. Using the same kernel arguments as before I was able to get to the Subiquity install menu again. The display resolution was higher and sharper with the UEFI boot. Problem…..at the end of the install when it is writing Grub, it said there was a fail writing to /boot/efi. I looked at the detailed log and it was a fail after ‘curtin’ processing and it couldn’t find the partition. I exited the install with reboot choice and waited to see what happened. Well, it actually booted and showed localhost tty with a ‘localhost.localdomain login:‘. It would not take the user I supplied and the password. Research revealed that because of that final install Grub error, some of the settings did not get applied….like the added user and password. Interestingly enough, the network settings I put in stuck, but OpenSSH-Server was never installed.

The final Stretch

I didn’t want to try another install again, so I decided to go the Grub rescue route. As the recent semi-successful install booted and did not show Grub menu, it may have defaulted to timeout of zero. So I rebooted and as soon as I saw the bios page I started tapping the ESC key. Sure enough, the Grub menu showed and I quickly went to advanced before it timed out and booted. I chose the recovery menu, and then the ‘Root Console’ choice. I was Root and could fix things up.

  1. adduser username (then create password)
  2. chsh -s /bin/bash username
  3. adduser username sudo
  4. adduser username adm
  5. apt update
  6. apt install openssh-server
  7. apt install tasksel
  8. apt install samba

I was now at a point where I could remote in and finish package installs and get xrdp on there to access from Windows. I hope I will have luck with the AMD graphics driver for this install, but at least after 4 days, I have a working server I can work with.

[See next post for Update]