Troubleshooting a Mysterious Networking Issue in Windows 11 (NOT!)

Networking issues can be frustrating and time-consuming to troubleshoot. This was just one of my many experiences troubleshooting an interesting network issue that took me a while to solve.

The Problem: One day, I noticed that my computer’s network connection was acting up. The network interface card (NIC) was sending packets just fine, but it was receiving very few packets, and eventually, it would stop receiving packets altogether. At first, I suspected that the issue happened after I installed the Insider Preview of Windows 11, so I reset Windows. I updated the Realtek NIC driver to the latest version, hoping that it might help. The problem persisted.

The Troubleshooting: Next, I decided to reinstall Windows 11 from scratch, thinking that it might fix the issue. The problem still persisted even after the fresh install. Now I knew that the issue was likely to be hardware.

I boot into Linux from a USB drive. To my surprise, the issue persisted even in Linux. This ruled out any software or driver issues with Windows.

The Solution: I started to suspect that the issue might be with my Wi-Fi access point. I have a TP-Link Deco 6E mesh Wi-Fi system, and one of the access points acts as the main router. I decided to swap the problematic access point with another one, and to my relief, the issue disappeared instantly. My NIC was now sending and receiving packets normally, and I was back online.

Conclusion: Networking issues can be tricky to troubleshoot, and it’s easy to get lost in a sea of software and driver issues. Sometimes, the problem might not even be with your computer at all, but with your network equipment. If you’re experiencing a similar networking issue, try ruling out all software and driver issues first, and then focus on your network equipment. Hopefully, my experience will save you some time and frustration.

Watch out for Apparmor!

I’ve been hit by Apparmor a couple of times now. First with Samba, then with Openldap. AppArmor is a mandatory access control (MAC) system that restricts the capabilities of applications on a Linux system. While it can enhance the security of a Linux system, it can also cause issues with certain applications. Here are some apps that AppArmor can break and workarounds for each.

  1. Docker

Docker is a popular containerization technology that allows users to package and run applications in isolated environments. AppArmor can cause issues with Docker by blocking access to certain system resources required by Docker containers. To work around this issue, you can create a custom AppArmor profile for Docker that allows it to access the necessary resources.

To create a custom AppArmor profile for Docker, you can create a new profile file in the /etc/apparmor.d/ directory with the following contents:

# Profile for Docker
profile docker-container {
  # Allow access to necessary system resources
  /var/lib/docker/** rw,
  /var/run/docker.sock rw,
  /sys/fs/cgroup/** rw,
  /proc/sys/** rw,
  /etc/hostname r,
  /etc/hosts r,
  /etc/resolv.conf r,
  /etc/passwd r,
  /etc/group r,
  /etc/shadow r,
  /etc/gshadow r,
}

After creating the profile file, you can load it into the AppArmor kernel by running the following command:

sudo apparmor_parser -r /etc/apparmor.d/docker-container
  1. Apache

Apache is a widely used web server that can also be affected by AppArmor. If Apache is running in a restricted environment, it may not be able to access certain files or directories. To resolve this issue, you can modify the AppArmor profile for Apache to allow access to the necessary resources.

To modify the AppArmor profile for Apache, you can edit the existing profile file located in /etc/apparmor.d/usr.sbin.apache2 and add the necessary permissions. For example, to allow Apache to access the /var/www/html/ directory, you can add the following line to the profile:

/var/www/html/** r,

After making the necessary changes, you can reload the AppArmor profile by running the following command:

sudo service apparmor reload
  1. MySQL

MySQL is a popular open-source relational database management system that can be affected by AppArmor. If AppArmor is blocking access to MySQL, you may experience issues with database connectivity. To work around this issue, you can modify the AppArmor profile for MySQL to allow access to the necessary resources.

To modify the AppArmor profile for MySQL, you can edit the existing profile file located in /etc/apparmor.d/usr.sbin.mysqld and add the necessary permissions. For example, to allow MySQL to access the /var/lib/mysql/ directory, you can add the following line to the profile:

/var/lib/mysql/** rwk,

After making the necessary changes, you can reload the AppArmor profile by running the following command:

sudo service apparmor reload
  1. Nginx

Nginx is a high-performance web server that can also be affected by AppArmor. If Nginx is running in a restricted environment, it may not be able to access certain files or directories required for its operation. To resolve this issue, you can modify the AppArmor profile for Nginx to allow access to the necessary resources.

To modify the AppArmor profile for Nginx, you can edit the existing profile file located in /etc/apparmor.d/usr.sbin.nginx and add the necessary permissions. For example, to allow Nginx to access the /var/www/html/ directory, you can add the following line to the profile:

/var/www/html/** r,

After making the necessary changes, you can reload the AppArmor profile by running the following command:

sudo service apparmor reload
  1. OpenSSH

OpenSSH is a widely used remote access tool that can also be affected by AppArmor. If AppArmor is blocking access to OpenSSH, you may not be able to establish a remote connection to your Linux system. To work around this issue, you can modify the AppArmor profile for OpenSSH to allow access to the necessary resources.

To modify the AppArmor profile for OpenSSH, you can edit the existing profile file located in /etc/apparmor.d/usr.sbin.sshd and add the necessary permissions. For example, to allow OpenSSH to access the /var/log/auth.log file, you can add the following line to the profile:

/var/log/auth.log rw,

After making the necessary changes, you can reload the AppArmor profile by running the following command:

sudo service apparmor reload
  1. Samba

To modify the AppArmor profile for Samba, you can edit the existing profile file located in /etc/apparmor.d/usr.sbin.smbd and add the necessary permissions. For example, to allow Samba to access the /mnt/share/ directory, you can add the following line to the profile:

/mnt/share/** rw,

After making the necessary changes, you can reload the AppArmor profile by running the following command:

sudo service apparmor reload
  1. OpenLDAP

To modify the AppArmor profile for OpenLDAP, you can create a new profile file in the /etc/apparmor.d/ directory with the following contents:

# Profile for OpenLDAP
profile slapd {
  # Allow access to necessary system resources
  /var/lib/ldap/ r,
  /var/lib/ldap/** rw,
  /var/run/slapd/** rw,
  /etc/ldap/slapd.conf r,
  /etc/ldap/slapd.d/ r,
  /etc/ldap/slapd.d/** r,
  /usr/sbin/slapd mr,
  /usr/sbin/slapd.debug mr,
  /usr/sbin/slapd-{slave,monitor} ix,
  /usr/sbin/slapd.dbg mr,
  /usr/sbin/slapd-sock rw,
  /usr/sbin/slapd-sock-debug rw,
  /usr/sbin/slaptest mr,
}

After creating the profile file, you can load it into the AppArmor kernel by running the following command:

sudo apparmor_parser -r /etc/apparmor.d/slapd

By modifying AppArmor profiles for specific applications in this way, you can ensure that your applications have the necessary permissions to function correctly while still maintaining the security benefits of AppArmor.

AppArmor can cause issues with various applications on a Linux system, but these issues can usually be resolved by modifying the AppArmor profile for the affected application. By following the steps outlined above, you can ensure that your applications are functioning correctly while still maintaining the security benefits of AppArmor.

Loading up Active Directory with lots of groups

Loading up Active Directory with lots of groups can be a tedious task, but it can be made easier by following some steps. I recently had to do this to test a product to make sure that it can handle a large amount of data. I started with a list of job titles. Found that those titles were not enough groups and so ended up using a list of animals as the groups input to provide the script to automate the process of creating groups in Active Directory.

First, let’s assume that you already have Active Directory set up and that you have the necessary permissions to create groups. We will use the ldif template provided in the question to create groups in Active Directory.

Here is the step-by-step process to load up Active Directory with lots of groups:

  1. Prepare the list of groups: In our example, the list of animals is provided in the question. You can create your own list of groups based on your requirements.
  2. Create an ldif file: Use the ldif template provided in the question to create an ldif file that contains the group details. Make sure to replace {groupname} in the template with the actual name of the group.
  3. Run a for loop: To automate the process of creating groups, we can use a while loop that reads the list of groups and creates the groups in Active Directory using the ldif file. Here’s an example script:
#!/bin/bash

# Read the list of groups from a file
while read -r group; do
  # Replace {groupname} in the ldif file with the actual group name
  sed "s/{groupname}/$group/" group.ldif >> temp.ldif
  # Create the group in Active Directory using ldapadd command
  ldapadd -x -D "CN=Administrator,CN=Users,DC=mydomain,DC=com" -w password -f temp.ldif
done < groups.txt

In the above script, replace the following:

  • group.ldif with the name of the ldif file that you created in step 2.
  • groups.txt with the name of the file that contains the list of groups.
  • CN=Administrator,CN=Users,DC=mydomain,DC=com with the actual Distinguished Name (DN) of the user account that you want to use to create the groups.
  • password with the password for the user account.
  1. Run the script: Save the script to a file (e.g., create-groups.sh) and make it executable using the command chmod +x create-groups.sh. Then run the script using the command ./create-groups.sh.

That’s it! The script will create all the groups in the list and add them to Active Directory. You can modify the ldif template and the script as per your requirements to create groups with different attributes and properties.

irqbalance or set_irq_affinity – interesting cause for a network performance issue.

When it comes to high-performance computing, squeezing every bit of performance out of the system is crucial. One of the critical factors in achieving high performance is reducing system latency. Interrupt requests (IRQs) are a type of signal generated by hardware devices that require attention from the CPU. By default, IRQs can be delivered to any CPU core in a multi-core system. This can lead to cache misses and contention, ultimately leading to increased latency. Fortunately, there are tools available to help manage IRQ affinity and reduce latency, such as irqbalance and set_irq_affinity. https://github.com/majek/ixgbe/blob/master/scripts/set_irq_affinity

irqbalance is a Linux daemon that helps to balance IRQs across multiple CPU cores to reduce latency. By default, irqbalance distributes IRQs across all CPU cores, which is a good starting point. However, depending on the system configuration, it may be necessary to adjust IRQ affinity further to optimize performance.

set_irq_affinity is a script that allows users to set IRQ affinity for specific hardware devices. The script can be used to specify which CPU cores should receive IRQs for a specific hardware device, reducing the chance of cache misses and contention. Set_irq_affinity requires root access to run and must be executed for each device on the system.

To use set_irq_affinity, first, identify the device’s IRQ number using the “cat /proc/interrupts” command. Once the IRQ number has been identified, run the set_irq_affinity script, specifying the IRQ number and the desired CPU cores. For example, to set the IRQ affinity for IRQ 16 to CPU cores 0 and 1, run the following command:

sudo set_irq_affinity.sh 16 0-1
This command tells the kernel to route IRQ 16 to CPU cores 0 and 1.

Keep in mind that setting IRQ affinity is a delicate balance. Setting IRQ affinity for too few CPU cores can result in increased latency due to increased contention for those cores. On the other hand, setting IRQ affinity for too many CPU cores can result in inefficient cache usage and increased latency due to cache misses.

In summary, managing IRQ affinity is an important aspect of optimizing system performance, particularly in high-performance computing environments. The irqbalance daemon can help to balance IRQs across multiple CPU cores, while set_irq_affinity allows users to specify the IRQ affinity for specific hardware devices. By carefully managing IRQ affinity, users can reduce latency and achieve better system performance.

Clean up your old Kubernetes persistent data!

If you have ever removed a node from a Kubernetes cluster and then added it back, you may have encountered some issues with persistent data. Persistent data is any data that survives beyond the lifecycle of a pod, such as databases, logs, or configuration files. Kubernetes uses persistent volumes (PVs) and persistent volume claims (PVCs) to manage persistent data across the cluster.

However, sometimes these resources may not be cleaned up properly when a node is deleted or drained. This can cause problems when you try to reuse the node for another cluster or add it back to the same cluster. For example, you may see errors like:

  • Failed to attach volume \”pvc-1234\” on node \”node1\”: volume is already attached to node \”node2\”
  • Failed to mount volume \”pvc-5678\” on pod \”pod1\”: mount failed: exit status 32
  • Failed to create subPath directory for volumeMount \”data\” of container \”db\”: mkdir /var/lib/kubelet/pods/abcd-efgh/volumes/kubernetes.io~nfs/data: file exists

To avoid these issues, you need to clean up your old Kubernetes persistent data before adding a node back to a cluster. Here are some steps you can follow:

Step 1: Delete or unbind any PVCs associated with the node

The first step is to delete or unbind any PVCs that are associated with the node you want to remove. A PVC is a request for storage by a user or a pod. It binds to a PV that provides the actual storage backend. When you delete a PVC, it also releases the PV that it was bound to, unless the PV has a reclaim policy of Retain.

To list all the PVCs in your cluster, you can use the command:

kubectl get pvc --all-namespaces

To delete a PVC, you can use the command:

kubectl delete pvc <pvc-name> -n <namespace>

Alternatively, you can unbind a PVC from a PV without deleting it by editing the PVC spec and removing the volumeName field. This will make the PVC available for binding to another PV.

To edit a PVC, you can use the command:

kubectl edit pvc <pvc-name> -n <namespace>

Step 2: Delete any PVs that are not bound to any PVCs

The next step is to delete any PVs that are not bound to any PVCs. A PV is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using storage classes. It is a resource in the cluster just like a node. PVs have a lifecycle independent of any pod that uses them.

To list all the PVs in your cluster, you can use the command:

kubectl get pv

To delete a PV, you can use the command:

kubectl delete pv <pv-name>

Note that deleting a PV does not necessarily delete the underlying storage device or volume. Depending on the type of storage and the reclaim policy of the PV, you may need to manually delete the storage device or volume from your cloud provider or storage server.

Step 3: Delete any leftover data on the node

The final step is to delete any leftover data on the node that you want to remove. This may include directories or files that were created by Kubernetes or by your applications. For example, you may need to delete:

  • The /etc/cni/net.d directory that contains CNI (Container Network Interface) configuration files
  • The /var/lib/kubelet directory that contains kubelet data such as pods, volumes, plugins, etc.
  • The /var/lib/etcd directory that contains etcd data if the node was running an etcd member
  • The /var/lib/docker directory that contains docker data such as images, containers, volumes, etc.
  • Any other application-specific data directories or files that were mounted or created on the node

To delete these directories or files, you can use commands like:

sudo rm -rf /etc/cni/net.d
sudo rm -rf /var/lib/kubelet
sudo rm -rf /var/lib/etcd
sudo rm -rf /var/lib/docker
sudo rm -rf /path/to/your/application/data

Be careful when using these commands and make sure you are deleting only what you intend

Source: Conversation with Bing, 3/27/2023(1) A complete storage guide for your Kubernetes storage problems. https://bing.com/search?q=Kubernetes+persistent+data Accessed 3/27/2023.
(2) Persistent Volumes | Kubernetes. https://kubernetes.io/docs/concepts/storage/persistent-volumes/?ref=hack-slash Accessed 3/27/2023.
(3) Kubernetes Persistent Volumes – Ultimate Guide – Knowledge Base by …. https://phoenixnap.com/kb/kubernetes-persistent-volumes Accessed 3/27/2023.
(4) Data persistence on Kubernetes – SQL Server Big Data Clusters. https://learn.microsoft.com/en-us/sql/big-data-cluster/concept-data-persistence?view=sql-server-ver15 Accessed 3/27/2023.
(5) A complete storage guide for your Kubernetes storage problems. https://www.cncf.io/blog/2020/04/28/a-complete-storage-guide-for-your-kubernetes-storage-problems/ Accessed 3/27/2023.
(6) Data Persistence in Kubernetes | Kubernetes Volumes simply explained …. https://dev.to/techworld_with_nana/data-persistence-in-kubernetes-kubernetes-volumes-simply-explained-852 Accessed 3/27/2023.

Help! OpenLDAP won’t start

slapd[5472]: main: TLS init def ctx failed: -1

I borrowed some information from here: https://apple.stackexchange.com/questions/107130/slapd-daemon-cant-start-tls-init-def-ctx-failed-1. Basically, just run slapd -d1 and see where the certificate is having trouble.

Crazily, before I bothered to check that, I just wiped my entire ldap server and rebuilt it. What’s even crazier is that after reinstalling, it never started either! Using CentOS 7, I removed the openldap-servers package and deleted the /var/lib/ldap and /etc/openldap directories. Installing the rpms recreated those directories, but did not rebuild the self-signed certificates in /etc/openldap/certs. I ended up finding this: 0006945: CentOS 6.5: /etc/openldap/certs/* missing – CentOS Bug Tracker. I guess there’s a post-script that should be running when the openssl-servers package gets installed, /usr/libexec/openldap/create-certdb.sh. By running it, it did create some certificates, but those didn’t allow the ldap server to start either.

Finally, I disabled SSL to fix it. These were the steps.

  1. Edit the /etc/openldap/slapd.d/cn/=config.ldif file. Remove anything that starts with olcTLS. There should be only a couple of lines.
  2. Then stop the server from starting in TLS. You may or may not need to do this. In /etc/sysconfig/slapd, if you have ldaps:///, you can remove that part so that the server won’t start in TLS.
  3. Finally, when you’re done with that, the LDAP server will start.

If you want to re-enable the TLS, you can follow these instructions to do it. Configure OpenLDAP over SSL/TLS [Step-by-Step] Rocky Linux 8 | GoLinuxCloud

You can also potentially run into this problem with SELinux or AppArmor. With Ubuntu and AppArmor, here’s how to get around it. https://askubuntu.com/questions/499164/slapd-tls-not-working-apparmor-errors

Hope this helps you!

Use yum to manage your packages and stop using rpm!

I hate seeing the RPMDB altered message when doing yum updates!

Transaction Summary
=======================================================================================================================
Install 1 Package
Upgrade 1 Package

Total size: 309 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Warning: RPMDB altered outside of yum.

For that reason, I tell sysadmins when installing or upgrading rpms to use:

yum -y install <rpm file>

and to use

yum -y remove <rpm name>

to remove the rpm you want.

Solaris pkg upgrade fails with “maximum number of instances of the package which may be supported at one time on the same system has already been met” message

This message is pretty awesome, isn’t it? You can get this message when trying to upgrade a package. At least that’s what happened to me.

username# pkgadd -d .

The following packages are available:

  1  pkgname          pkgname

                          (sparc) version.sol5.sparc

Select package(s) you wish to process (or 'all' to process all packages). (default: all) [?,??,q]: 1

Processing package instance <pkgname> from </tmp/ven/solaris>

pkgname(sparc) version.sol5.sparc Illumio

Current administration requires that a unique instance of the <pkgname> package be created.  However, the maximum number of instances of the package which may be supported at one time on the same system has already been met.

 No changes were made to the system.

This issue is pretty easy to get around. You just need point your admin file that has the right options. In my case, my admin file needed the instance=overwrite:

mail=
instance=overwrite
partial=ask
runlevel=ask
Require that our dependencies are met when installing.
idepend=quit
However, if someone tries to uninstall us but another package depends on us,
we should just warn them & ask if they want to proceed anyway.
rdepend=ask
space=ask
setuid=ask
conflict=ask
action=nocheck
networktimeout=60
networkretries=3
authentication=quit
keystore=/var/sadm/security
proxy=
basedir=default

If you’re using instance=ask, it works also. It’ll just ask you before overwriting.

Fooling around with pkgadd (Solaris packages)

I basically had a Solaris SVR4 package that I needed to install. I didn’t care if the package worked or not after it installed. This is what happened when it first failed.

pkgadd: ERROR: checkinstall script did not complete successfully

The installer said that I was missing a package, so I went into the pkgname/install/checkinstall script and just commented those lines out. After doing that, this happened.

root@alton-solaris:/tmp# pkgadd -d .
The following packages are available:
1 pkgname pkgname
(i386) version
Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]:
Processing package instance from
pkgname(i386) version
company
Executing checkinstall script.
OS Release = 11.4
Processing package information.
Processing system information.
pkgadd: ERROR: packaging file is corrupt
file cksum <26912> expected <26914> actual
Installation of failed (internal error).
No changes were made to the system.

Obviously, there’s some sort of check for integrity of the file. To get around that, I went in and edited the pkgname/pkgmap file to make the changes from 26912 to 26914.

After doing this, the package magically installed. Fun!

Changing root password on Solaris 11.3 x86

I’m posting this only because the process has changed a bit since I did it back in the days. Because I’m not using Solaris on SPARC, there’s no STOP-A, boot -s, etc.

Just like back in the days with Solaris 8, you would need to boot with a cdrom or PXE or jumpstart, whatever method you have. Afterwards, you take the shell option. Obviously, you’re not installing the OS.

Solaris 11 uses ZFS, not UFS so you can’t directly mount a partition. You need import a pool instead.

mkdir /b
zpool import -f -R /a rpool
zfs set mountpoint=legacy rpool/ROOT/solaris
mount -F zfs rpool/ROOT/solaris /b
vi /b/etc/shadow

Edit the shadow file

Find your username and remove the password hash

Change your user id
username:whateverthehashis:12345::::::23456
to
username::12345::::::23456

Then we will need to allow empty passwords at login

$ vi /b/etc/default/login
Change the line:
PASSREQ=YES
to
PASSREQ=NO

umount /b
zfs set mountpoint=/ rpool/ROOT/solaris
zpool export rpool
init 6

When the system boots, you should be able to log in as root and just press return for the password. If you’re logging in through SSH, you won’t need to enter a password.

Hope this saved you some time!