Just Another Massive Fuckup

Just Another Massive Fuckup
Photo by Sumudu Mohottige / Unsplash

Anyone who has ever had to manage Apple devices at scale will know that an MDM solution (Mobile Device Management) is a must. I do not manage Apple devices at scale so...

...The end.

MDM: Because ‘Turn It Off and On Again’ Doesn't Scale

Unfortunately I do manage the server that the MDM sits on though, so not quite the end.
Our MDM of choice for all things Apple (Macs, iPhones, iPads, etc...) is the industry standard Jamf. There are others such as Kandji, Microsoft Intune, Apple Business Manager and Mosyle but the general consensus is that Jamf is the most powerful and feature complete.

Jamf is an enormous beast with many complex components...
It's also written in Java and runs on Apache Tomcat, what a perfect storm.

Having said that, it does have some pretty cool features, one of which is a connector to ADCS (Active Directory Certificate Services), I bet you're super excited to hear this.
Despite ADCS being a topic as dry as dysprosium and usually a component of a standard Windows domain controller setup, it does provide a multitude of services that make operating PKI easier. One such service is "device certificates", these can be used to authenticate a device using 802.1x to various network based infrastructure. Such services range from Wi-Fi to port based ethernet and dynamic VLAN assignment but could theoretically extend to anything that can validate a device is authorized using a certificate.

So why does Jamf need a connector?

The ADCS mechanism to retrieve certificates is a combination of RPC and HTTP. Windows clients set up the necessary connectivity to the ADCS server automatically if a specific GPO is set to enabled, Macs don't use GPO and cannot currently start an RPC session with ADCS.
For Windows clients the process looks something like this from a very high level:

Even if a Mac is domain joined in the same way as a Windows client they cannot establish the RPC session and they don't talk the DCOM protocol even though implementations for Unix systems do exist.

Given this limitation some sort of helper that can talk DCOM is needed to bridge the client device to ADCS. This is where Jamf ADCS connector comes in.
The device certificate acquisition process then looks like this:

A device asks Jamf for a certificate and Jamf sends the request to the ADCS connector which runs on a Windows server. The ADCS connector then talks DCOM to the ADCS server and returns the certificate to Jamf which passes it back to the client. The connector is essentially a web application that runs in IIS and makes DCOM calls over RPC to the ADCS server on behalf of Jamf.

What next?

Normally you run the setup process on a fresh Windows server that can reach the ADCS server and then point Jamf at it and you're done. Except in our environment we run two Jamf servers, one is internal and one is in the DMZ for external devices to talk to. The reasons for this setup are not entirely clear to me since it was created before I started at {org}, I suspect it was handled by people not overly familiar with Linux or networking. They seem to have followed the Jamf install documentation to the letter without any thought for {org's} infrastructure.
Since Macs can only be enrolled in one Jamf server, the internal server is "clustered" (I use that term very loosely) with the DMZ server and split horizon DNS is used to redirect external clients to the DMZ server, gross.

If I was to redesign it (and I plan to) there would be no DMZ server and the "internal" server would be behind the load balancers. In addition I wouldn't expose the Apache Tomcat standard port of 8443 to everyone, rather I'd let the load balancers forward to that port internally from 443. This is more personal preference however, I hate having application servers directly exposed.

Preamble aside, the ADCS connector was setup and working fine for the internal server, the DMZ server not so much.

Yeet?

The temptation to blow away the DMZ server and redesign the setup then and there was high.

Essentially what was occurring was when a client performed a cert request against the DMZ server it would time out. Looking at the JAMFSoftwareServer.log files showed nothing of note other than "The thing you wanted to happen, didn't. ENJOY!"

The guys that administer Jamf opened a support case to try and get some additional insight as to why this could be happening on one server but not the other. Meanwhile I was asked to confirm that the two servers are identical. Since we use Puppet for config management over the core OS, there would be no difference there. The only differences that may exist would be related to the Jamf software stack itself and possibly their bundled version of Tomcat. Not willing to pour over every single config file (There are hundreds) to find any differences between the two servers, I had a look at the core configuration files and confirmed they were the same. If Jamf support wanted to take a look at any of the others then I would be happy to oblige, however without guidance or some idea as to why the certificate requests were failing, looking for an issue would be like finding a needle in a haystack.

(Lack of) Support

Jamf support came back with some questions about the environment and configuration such as:

  • Jamf server OS version: RHEL 7
  • ADCS server OS version: Server 2022
  • ADCS Connector server OS version: Server 2022

...and then they went away again.

I thought I better take this into my own hands and do some packet captures on the DMZ server.

Good old tcpdump and a request from a client device to the rescue. Loading the pcap file into Wireshark I saw:

Delicious, just what I wanted to see. TCP RST from the far end...

Going back to Jamf support with this information they immediately blamed the network "It must be the network or the hypervisor or the firewall. Yes the firewall! It must be the firewall because it's always the firewall! Check the firewall!"

How about, piss off.

As soon as any external vendor gets wind of a WAF or general purpose firewall in any environment they blame that. It's such an easy scapegoat because firewalls are ✨mysterious✨ and the vendor always assumes that the client doesn't know how to set them up properly, or that's how it comes across at least. There is also the fact that the person you're communicating with at the vendor is likely NOT a network engineer.

Around and around we go

Trying to convince Jamf support that we do in fact know how to configure our firewalls was like pulling teeth.
We had a video call where I shared the firewall monitoring page and several terminal windows to show in real time that the firewall was not the issue. We even created an Any/Any rule between the DMZ server and the ADCS Connector server.

They eventually admitted that it must be something else going wrong. They had us check the certificate thumb prints that the Jamf server presents to the ADCS Connector server multiple times. This was getting beyond tedious.

I requested engineering escalation since the support team appeared to be clutching at straws by this point.
Whilst waiting for the escalation team to come back to me I had a thought. I should compare a packet capture between the internal server and the DMZ server.

Oh yeah we don't support that...

Looking at the two packet captures I could see the DMZ server attempting TLS 1.3 with the ADCS Connector IIS server and then being sent back a big "Fuck you".
Conversely the internal server was negotiating TLS 1.2...

WHY?!

Since Jamf is a Java application running on Tomcat the only two things that could manipulate what TLS version it uses were likely the OpenSSL config or Tomcat's Java startup arguments. Since I knew OpenSSL was config managed by Puppet on our Red Hat Satellite Puppet master server it couldn't have drifted from the defined config. It must be Tomcat.

Comparing the Java startup arguments on the internal machine to the DMZ machine revealed the issue.

export JAVA_OPTS="$JAVA_OPTS -Djdk.tls.client.protocols=TLSv1.2"

At some point in the distant past before I'd ever even see a Jamf server, Tomcat was configured differently on the internal server to the DMZ server.

Essentially the ADCS Connector IIS application doesn't support TLS 1.3 and doesn't negotiate a lower encryption level. Once the Tomcat Java startup arguments were modified and Jamf restarted everything started working.

MAGIC

When going back to Jamf support with the solution we got this response:

Oh yeah, we usually tell people to switch TLS 1.3 off because we don't support that...

That👏 would👏 have👏 been👏 good👏 to👏 know👏


I probably should have looked more closely at the Tomcat config but I made an assumption that it was a Jamf issue. Jokes on me.