Pro Git

When learning a new topic it has been my experience that you read several explanations and tutorials and finally, the last one you read makes the topic crystal clear. I don't think it did. It just was there when you finally annealed all that information into those crystals. Anyway, if you are learning git, as I am, even though I have been using it for years, the book Pro Git by Scott Chacon is working for me. I have not finished it yet, but so much is clearer now.

Future of programming environments for professionals & kids

If you are interested in the future of programming environments for the professionals and the kids I recommend these Strange Loop 2022 talks

"Stop Writing Dead Programs" by Jack Rusher (Strange Loop 2022) - YouTube
https://www.youtube.com/watch?v=8Ab3ArE8W3s

"Hedy: A Gradual programming language" by Felienne Hermans (Strange Loop 2022) - YouTube
https://www.youtube.com/watch?v=fmF7HpU_-9k

What I particually agree with Jack Rusher is the need to move on from what he calls "batch", ie edit-compile-run loops, and toward what might be called "surgery", ie working within the running environment.

Who does have a second job?

Recently Equifax fired some of its employees that had second full-time jobs. Equifax knew about this because it has the data on some 100M employees. It is unsettling that Equifax could actually make this into a service for other employers. Nevertheless, it is an interesting SQL question. Given this schema and data

create table employees ( ssn char(1), ein char(1) );

insert into employees (ssn, ein) values
 ('1','a'),
 ('1','b'),
 ('2','b'),
 ('3','c'),
 ('3','d'),
 ('3','e'),
 ('4','b'),
 ('4','d'),
 ('5','b');

What is the query needed to discover who at company EIN has a second job. My solution is to first find all the people with 2 or more jobs

select e1.ssn as ssn, e1.ein as ein1, e2.ein as ein2 
from employees e1, employees e2 
where e1.ssn = e2.ssn and e1.ein <> e2.ein;

And in this set find the people employed by 'b'

select * from ( 
    select e1.ssn as ssn, e1.ein as ein1, e2.ein as ein2 
    from employees e1, employees e2 
    where e1.ssn = e2.ssn and e1.ein <> e2.ein ) t 
where t.ein1 = 'b'

I suspect there is a more elegant way, but this was good enough to know it is easy to figure out.

Jira is software management's Pakistan’s truck art

Everyone picks on Jira and the criticism is well deserved. Working with it recently I came to see Jira as this
when what us practitioners want is this

Back to coding.

Emoji imparied

I don't understand any of the dozens of emoji the folks around me use to provide feedback or exclamation.

Yet another Obsidian, Dataview, and GTD exploration

I have been looking for an "external brain" for many years. I was working at Brown University when tools like Intermedia were being developed and my friends were actively discussing and building Ted Nelson's Xanadu. A consequence is that my standard for these tools is very high.

I am always happy to find a tool that satisfies 90% of my needs and offers a plugin API that someone has created a programatic binding for. Prior to the web, in a time when desktop applications ruled, I learned of Tcl. Years later when wiki's were new I wrote plugins for Jspwiki for server side rendering using Tcl and JavaScript. More recently we have seen the rise of programmable notebooks starting with Jupyter, or, perhaps, earlier with Microsoft Word and Google Docs scripting.

These two threads came together recently as I was exploring Obsidian. Specifically, Obsidian has the Dataview plugin that, more or less, treats the Markdown notes as a queryable and navigable repository. I wanted to use Obsidian to help collect my projects under one interface using a loose GTD approach. Each project is a note and that note lists the project's next action and what it is waiting on as tasks. And there would be a "dashboard" note that automatically enumerates all next actions and waiting ons from all projects.

There are lots of ways of handing this in Obsidian and its plugins -- especially using the Checklist plugin. I think Nicole van der Hoeven's Actually getting things done with Obsidian // Checklist plugin is one of the best. However, I did not like how it was forcing an unnatural encoding and display of next actions and waiting on. Since I am in the exploration phase of learning Obsidian I let my perfectionism override my pragmatism.

A result of the exploration was to use Dataview to achieve my ends. I wanted to encode my project like the following

Note the annotation on the next action and waiting on tasks. The dashboard should look like

The key feature for this to work is the Dataview annotations it adds to the Obsidian tasks. The annotations are [next-action::] and [waiting-on::]. For the dashboard I can then use the annotations with a Dataview JavaScript code block to select the next actions and waiting ons across projects. Here is the GTD dashboard note

## Next Actions
```dataviewjs
let tasks = dv
	.pages('"projects"')
	.sort((a,b) => dv.compare(a.file.name, b.file.name))
	.file
	.tasks
	.filter(t => t.annotated && t.hasOwnProperty("next-action"));
if(tasks.length) {
	dv.taskList(tasks);
}
else {
	dv.paragraph("None")
}
```
## Waiting On
```dataviewjs
let tasks = dv
	.pages('"projects"')
	.sort((a,b) => dv.compare(a.file.name, b.file.name))
	.file
	.tasks
	.filter( t => t.annotated && t.hasOwnProperty("waiting-on"));
if(tasks.length) {
	dv.taskList(tasks);
}
else {
	dv.paragraph("None");
}
```
## Projects
```dataviewjs
dv
	.pages('"projects"')
	.sort((a,b)=>dv.compare(a.file.name,b.file.name))
	.forEach(p=>dv.paragraph(p.file.link))
```

END			

The end result is

The result is not exactly what I want. I don't want the annotations and the links to be display. I have not figures out how to eliminate them yet. It is a good start and I did learn much about Dataview and Obsidian. (Oh, the next step to enhance Dataview or write my own plugin. Maybe not.)

Spring and checked & unchecked exceptions

A few weeks ago a colleague asked about checked and unchecked exceptions and I mentioned offhand that it is useful to understand Spring's exception design and choices. This is a better response...

The Spring exception library has been around a long time and it has survived because it matches the semantics of servicing problems rather than categorizing technical failings. In particular, I am addressing the org.springframework.dao.DataAccessException hierarchy of exceptions. It is worth the time to read Chapter 9 of Expert One-On-One J2EE Design and Development to better understand Spring's exceptions  

The first question we need to ask is why do we use exceptions? For me an exception is due to an unanticipated or unforeseen problem that MUST be handled outside of the normal call chain. If we have a method that is expected to return a value and it can't then this is an exception. If we have a method that can be expected to not return a value then that is not an exception. For example, if the method "int getFoo(int bar)" is expected to have a valid return value for every value of bar then any problems must raise an exception. However, if the method does not have a valid return value for every value of bar then the method is badly specified. The method would be better specified as "Optional<Integer> getFoo(int bar)" or, better yet, named "findFoo". Once you have a well specified method then you can consider how to use exceptions.

What I like about Spring's data access exceptions is that they derive from three base classes RecoverableDataAccessException, NonTransientDataAccessException, and TransientDataAccessException. These base classes let the caller know how to respond to the exception -- and this is important -- if the caller wants to. For example, a method raising NonTransientDataAccessException (or its subclasses) can't be "retried" to get a different result. Whereas, a method raising TransientDataAccessException could  be retried, and a method raising RecoverableDataAccessException could be retried once some mitigation has been undertaken. For the example, "int getFoo(int)" could throw a NonTransientDataAccessException (well, a subclass like NotFoundException) if the given "bar" does not have a corresponding "foo".  

You can also see how we could have a similar set of base exceptions for process failures, eg RecoverableProcessException, NonTransientProcessingException, and TransientProcessingException. 

As to whether to use checked exceptions or not, I think there are two factors to consider. The first factor is how likely can intermediaries in the call chain practically respond to the exception? The second consideration is how important is it for the caller to know about the exceptions thrown by the method? I think understanding how to respond to exceptions is critical to building stable, recoverable applications. However, in a K8 world where failed applications have a small functional scope and can be restarted automatically stability and recoverability are less important. So, these days I am comfortable with unchecked exceptions BUT the exceptions should be declared on the method signature -- doing so better documents the method.

The value of logging

With the rise of logging software-as-a-service products (SaaS) the monetary cost of logging has increased. If the organization has not been able to recoup some of the previous costs of managing their own logging management in staffing or infrastructure then this cost is a real budget increase. Since the SaaS cost is related to logging volume there are departmental or company mandates to log less. Specifically, only log at the error and warning levels. I think this has been a mistake.

To state the obvious, logs are there to aid problem resolution. (I am not here concerned with APM.) Logs provide the context for the resolution, i.e. data values and time of occurrence. Not all problems are found in the logs; some come from user reports. However, all problem contexts can be found in the logs.

The problems are either consistent or intermittent. Consistent problems occur on every similar user action or API request. Some consistent problems occur on a wider set of user actions or API requests. 

Intermittent problems occur with variability over time or consistently over time. Some intermittent problems occur on a wider set of user actions or API requests. Intermittent problems within the application are usually the result of state change as a secondary activity of the response. Intermittent problems within a distributed architecture are usually due to one or more of the 8 fallacies of distributed computing.

The logging needs for consistent and intermittent problems are different. Logging for consistent problems can often be adequately initiated when returning up the call-chain. That is, an exceptional situation has occurred, and the response is following the error path. Logging for intermittent problems does not have this advantage and so logging must be initiated down the call-chain. 

The context to log is often just the inputs to a method/API and the outputs from a method/API, but only across packages or services. The goal of logging is not to trace the request and response, but to provide enough detail to initiate debugging at more than one point in the request’s response call-chain. 

It follows that logging must include the error messages and the context before (and after) the error. Generally, the purpose of the log levels are:

  • INFO for context – data values and time of occurrence;
  • WARN for nearing design limits (eg, capacity, duration, and absolutes) and so for expected but unwanted response (eg 401 and 5xx HTTP statuses); and
  • ERROR for unexpected responses.

Log messaging must be examined during code reviews as much as the implementation does. Logging can quickly become voluminous as developers tend towards CYA logging. A good senior developer or architect in conjunction with operations and product support can establish rules of thumb for logging that work well with everyone’s needs.

As to the costs of using a logging SaaS, consider not keeping the logs there for very long. (Keep all the logs locally for a long time, however. Local disk and AWS’s S3 are cheap.) Within the SaaS product for

  • older applications that are stable keep all logs for 48 hours;

  • newer applications that are unstable keep all logs for 48 hours; and

  • everything else keep all logs for 2 release or support cycles.

Note that the old vs new application qualifier can also relate to staff experience and longevity. The newer the staff it can take a while to recognize and debug the problem so keep the logs longer.

One last note, I have found it very useful to get a daily report of error and warning messages. Many of the messages are summarized along with an occurrence count. It is your daily health check on the application where you viscerally experience the ebb and flow of the application’s seasonal and instantaneous problems.