I spent a good part of yesterday tracking down a problem with the staging deployment of a feature I first started back in October of last year. (That it has taken this long to get it to staging has everything to do with how this organization manages work.) When you have such a extended period between implementation and deployment you rarely retain the feature's context and even rarer an environment within which to debug problems. It took me some time to regain that context and environment. (I should have left better notes for my future self.) Once I had that it became obvious that the feature worked and that the problem lay in the deployment.
The deployment is a small Kubernetes cluster. Each service has 2 or 3 container in several pods. I figured out the pod ids and container names (why are they all different!) and opened a terminal window for each and streamed the logs. I used a small script to highlight text that I was expecting to find. I then used the feature and discovered the problem was due to a corrupted private key stored in the deployment's database.
The organization uses Honeybadger.io to record exceptions and Elasticsearch to aggregate logs. These tools are intended to improve access to the details needed to debug issues. Each tool has its own user interface and mechanisms for accessing and searching. To use them you obviously need to understand these mechanisms and, more significantly, you need to know how the organization has configured them. That is, no two organizations use the same data model for how it records exceptions and log details.
The developer needs documentation about the configuration and there was none. Well, that is not quite true. This organization has thousands of incomplete, unmaintained, and contradictory Confluence pages. The "information" available to the developer is actually worse than none at all as they will waste time trying to piece together some semblance of a coherent (partial) picture. What I eventually concluded was that it could not be done and my best path forward was to look at the raw container logs.
I understand that at this organization I am a contractor and so just developer meat. But what I have seen is that this global, financial, highly profitable organization does not do any better for their developer employees. Perhaps all industries are like this. I have only experienced the software development industry and here they are mostly the same. It makes me sad and mad to see and experience such unnecessary frustration and toil.