Never having uncommitted changes

The other day I was commenting on how quickly a collegue seems to be able to create a new PR for a change. For me I need to

  1. stash my working branch's uncommitted changes, 
  2. switch to main, 
  3. pull down new changes,
  4. make the new branch, 
  5. make the actual change, 
  6. commit the change,
  7. push the branch, 
  8. open PR,
  9. switch back to my working branch, and 
  10. un-stash the uncommitted changes. 

He said, in effect, "I never have uncommitted changes." He also mentioned heavy reuse of his command line history, and I know of his highly customized git config [*]. But it was the idea of never having uncommitted changes that struck me. I think I have finally figured out that I should always be making small, tactical commits rather than waiting for a semantic commit. Then, when ready for the PR, use rebase not just for small reorganizations, but for wholesale reorganization into semantic commits.

* My use of IntelliJ for most git operations precludes these efficiencies.

Two civic tech books

I am currently working with a US state's education department to move its infrastructure and applications to the cloud. This is my first time working directly on a government project and it has been informative. To gain a broader understanding of the field of civic tech I read the books

A civic technologist's practice guide
by Cyd Harrell
cydharrell.com

The service organization
by Kate Tarling
theserviceorg.com

Harrell is writing in the context of the US and Tarling the UK. In many ways their persuasive styles seems to reflect the two broader cultures too.

Harrell's book contains a broad introduction to US government (all levels) and how work is accomplished there. It provides a good guide to these and provides effective strategies for success, whether working from inside or outside government. The book begins with the important topic of reckoning with privilege and ends with the need for self-care in, what can be, an intellectually frustrating and emotionally exhausting environment. I am pleased these were included. The resources at the back of the book look to be well considered (as are the few footnotes within). There is no index. I recommend this book if you are considering participating in civic tech.

Tarling's book is less about the current context of the work and more the means to change that context. To move from stovepiped departments to cross-disciplinary teams focusing on providing the whole service. Ie, product oriented rather than platform oriented. The national context is the UK and not the US, nevertheless it is helpful to see the kinds of tactics and artifacts needed to facilitate the transition. The book has a generous collection of resources at the back. There is no index. I would recommend this book if you have decided to participate in civic tech.

The first rule of a second brain is to not lose any content

The first rule of a second brain is to not lose any content. People, of which I am one, make mistakes. Those mistakes should be correctable. Even if the correction process is clumsy. When the tool fails at this our confidence in it is lost or greatly diminished. This is what happened to me with Obsidian and iCloud.

After several months of use it was clear that I did not use folders in Obsidian. I found that if I named notes well then the open command's search feature was often all I needed to locate the note I wanted. To locate others a full text search with a broad context tag (like a project tag) worked well. So I eliminated folders.

Before removing the folders I went through the notes to improve file names, add tags, and sometimes add useful search terms. I then moved the files out of the folders into a "notes" folder. (I do still have "notes", "attachments", "daily", and "templates" folders.) Once all the notes were removed from the folder I deleted the folder. 

All the movement and deletion was done within Obsidian. The reason I used Obsidian to do this, rather than use the Finder or the command line, was that I was unsure if Obsidian needed to "know" about these changes. Its internal workings are unknown to me and so this seemed like a responsible method.

I made a mistake and deleted one folder that was not yet empty. I did not discover this until a week later when, back at work, I needed a note that happened to be in that folder. It was gone. I was horrified as this note had the details of the sequence of intricate steps needed to build, configure, deploy, and use an internal server. It was the only copy I had. (That this information was in my personal notes is a story for another time.)

You would think a software developer with decades of experience with version control systems would never let this happen. But I did. And I did because I have become perfunctory about some matters of personal file storage. A file deleted in the Mac goes in to the Trash, Dropbox retains deleted files for 30 days, and my local storage is backed up at Backblaze. Losing files is kind of hard. Unfortunately, my Obsidian vault was on iCloud.

iCloud has a 30 day retention period for deleted files. For deleted files to be retained the deletion must use Mac specific SDK methods. (I am speculating here based on behavior.) Obsidian does not use these methods. It manipulates the file system as any other POSIX application would. Once a file is deleted it is gone, effectively, forever.

This loss did shock me more that I would have expected. I think part of the reason was because I had been considering moving my local storage to iCloud. That is no longer a consideration. 

As to Obsidian, I have grown less excited by it over these months. Its document editing and linking are rudimentary. Its plugin community is very active right now, but that is unlikely to continue over the long term needed by a second brain. Lastly, Markdown is an intentionally limited and ultimately weak markup language. Until I find an alternative, it is the better of the free solutions.

Blue bird exit

I have started the process to delete my Twitter account. My decision has little to do with its new owner. In fact, apart from the volume of angsty tweets, my timeline was largely undisturbed. What changed for me was my interest in the verity of information that was always available. I am just not interested in a shallow understanding of a broad array of topics any more. Better to use my time learning in depth and mulling over stuff.

Update: I canceled the delation. The month away has been refreshing and satisfying to have stopped a bad habit. I don't intended to return to the daily doom-scrolling, but I will drop in, from time to time, to have a look see. (2023-01-11)

Pro Git

When learning a new topic it has been my experience that you read several explanations and tutorials and finally, the last one you read makes the topic crystal clear. I don't think it did. It just was there when you finally annealed all that information into those crystals. Anyway, if you are learning git, as I am, even though I have been using it for years, the book Pro Git by Scott Chacon is working for me. I have not finished it yet, but so much is clearer now.

Future of programming environments for professionals & kids

If you are interested in the future of programming environments for the professionals and the kids I recommend these Strange Loop 2022 talks

"Stop Writing Dead Programs" by Jack Rusher (Strange Loop 2022) - YouTube
https://www.youtube.com/watch?v=8Ab3ArE8W3s

"Hedy: A Gradual programming language" by Felienne Hermans (Strange Loop 2022) - YouTube
https://www.youtube.com/watch?v=fmF7HpU_-9k

What I particually agree with Jack Rusher is the need to move on from what he calls "batch", ie edit-compile-run loops, and toward what might be called "surgery", ie working within the running environment.

Who does have a second job?

Recently Equifax fired some of its employees that had second full-time jobs. Equifax knew about this because it has the data on some 100M employees. It is unsettling that Equifax could actually make this into a service for other employers. Nevertheless, it is an interesting SQL question. Given this schema and data

create table employees ( ssn char(1), ein char(1) );

insert into employees (ssn, ein) values
 ('1','a'),
 ('1','b'),
 ('2','b'),
 ('3','c'),
 ('3','d'),
 ('3','e'),
 ('4','b'),
 ('4','d'),
 ('5','b');

What is the query needed to discover who at company EIN has a second job. My solution is to first find all the people with 2 or more jobs

select e1.ssn as ssn, e1.ein as ein1, e2.ein as ein2 
from employees e1, employees e2 
where e1.ssn = e2.ssn and e1.ein <> e2.ein;

And in this set find the people employed by 'b'

select * from ( 
    select e1.ssn as ssn, e1.ein as ein1, e2.ein as ein2 
    from employees e1, employees e2 
    where e1.ssn = e2.ssn and e1.ein <> e2.ein ) t 
where t.ein1 = 'b'

I suspect there is a more elegant way, but this was good enough to know it is easy to figure out.

Jira is software management's Pakistan’s truck art

Everyone picks on Jira and the criticism is well deserved. Working with it recently I came to see Jira as this
when what us practitioners want is this

Back to coding.

Emoji imparied

I don't understand any of the dozens of emoji the folks around me use to provide feedback or exclamation.

Yet another Obsidian, Dataview, and GTD exploration

I have been looking for an "external brain" for many years. I was working at Brown University when tools like Intermedia were being developed and my friends were actively discussing and building Ted Nelson's Xanadu. A consequence is that my standard for these tools is very high.

I am always happy to find a tool that satisfies 90% of my needs and offers a plugin API that someone has created a programatic binding for. Prior to the web, in a time when desktop applications ruled, I learned of Tcl. Years later when wiki's were new I wrote plugins for Jspwiki for server side rendering using Tcl and JavaScript. More recently we have seen the rise of programmable notebooks starting with Jupyter, or, perhaps, earlier with Microsoft Word and Google Docs scripting.

These two threads came together recently as I was exploring Obsidian. Specifically, Obsidian has the Dataview plugin that, more or less, treats the Markdown notes as a queryable and navigable repository. I wanted to use Obsidian to help collect my projects under one interface using a loose GTD approach. Each project is a note and that note lists the project's next action and what it is waiting on as tasks. And there would be a "dashboard" note that automatically enumerates all next actions and waiting ons from all projects.

There are lots of ways of handing this in Obsidian and its plugins -- especially using the Checklist plugin. I think Nicole van der Hoeven's Actually getting things done with Obsidian // Checklist plugin is one of the best. However, I did not like how it was forcing an unnatural encoding and display of next actions and waiting on. Since I am in the exploration phase of learning Obsidian I let my perfectionism override my pragmatism.

A result of the exploration was to use Dataview to achieve my ends. I wanted to encode my project like the following

Note the annotation on the next action and waiting on tasks. The dashboard should look like

The key feature for this to work is the Dataview annotations it adds to the Obsidian tasks. The annotations are [next-action::] and [waiting-on::]. For the dashboard I can then use the annotations with a Dataview JavaScript code block to select the next actions and waiting ons across projects. Here is the GTD dashboard note

## Next Actions
```dataviewjs
let tasks = dv
	.pages('"projects"')
	.sort((a,b) => dv.compare(a.file.name, b.file.name))
	.file
	.tasks
	.filter(t => t.annotated && t.hasOwnProperty("next-action"));
if(tasks.length) {
	dv.taskList(tasks);
}
else {
	dv.paragraph("None")
}
```
## Waiting On
```dataviewjs
let tasks = dv
	.pages('"projects"')
	.sort((a,b) => dv.compare(a.file.name, b.file.name))
	.file
	.tasks
	.filter( t => t.annotated && t.hasOwnProperty("waiting-on"));
if(tasks.length) {
	dv.taskList(tasks);
}
else {
	dv.paragraph("None");
}
```
## Projects
```dataviewjs
dv
	.pages('"projects"')
	.sort((a,b)=>dv.compare(a.file.name,b.file.name))
	.forEach(p=>dv.paragraph(p.file.link))
```

END			

The end result is

The result is not exactly what I want. I don't want the annotations and the links to be displayed. I have not figured out how to eliminate them yet. It is a good start and I did learn much about Dataview and Obsidian. (Oh, the next step to enhance Dataview or write my own plugin. Maybe not.)

Spring and checked & unchecked exceptions

A few weeks ago a colleague asked about checked and unchecked exceptions and I mentioned offhand that it is useful to understand Spring's exception design and choices. This is a better response...

The Spring exception library has been around a long time and it has survived because it matches the semantics of servicing problems rather than categorizing technical failings. In particular, I am addressing the org.springframework.dao.DataAccessException hierarchy of exceptions. It is worth the time to read Chapter 9 of Expert One-On-One J2EE Design and Development to better understand Spring's exceptions  

The first question we need to ask is why do we use exceptions? For me an exception is due to an unanticipated or unforeseen problem that MUST be handled outside of the normal call chain. If we have a method that is expected to return a value and it can't then this is an exception. If we have a method that can be expected to not return a value then that is not an exception. For example, if the method "int getFoo(int bar)" is expected to have a valid return value for every value of bar then any problems must raise an exception. However, if the method does not have a valid return value for every value of bar then the method is badly specified. The method would be better specified as "Optional<Integer> getFoo(int bar)" or, better yet, named "findFoo". Once you have a well specified method then you can consider how to use exceptions.

What I like about Spring's data access exceptions is that they derive from three base classes RecoverableDataAccessException, NonTransientDataAccessException, and TransientDataAccessException. These base classes let the caller know how to respond to the exception -- and this is important -- if the caller wants to. For example, a method raising NonTransientDataAccessException (or its subclasses) can't be "retried" to get a different result. Whereas, a method raising TransientDataAccessException could  be retried, and a method raising RecoverableDataAccessException could be retried once some mitigation has been undertaken. For the example, "int getFoo(int)" could throw a NonTransientDataAccessException (well, a subclass like NotFoundException) if the given "bar" does not have a corresponding "foo".  

You can also see how we could have a similar set of base exceptions for process failures, eg RecoverableProcessException, NonTransientProcessingException, and TransientProcessingException. 

As to whether to use checked exceptions or not, I think there are two factors to consider. The first factor is how likely can intermediaries in the call chain practically respond to the exception? The second consideration is how important is it for the caller to know about the exceptions thrown by the method? I think understanding how to respond to exceptions is critical to building stable, recoverable applications. However, in a K8 world where failed applications have a small functional scope and can be restarted automatically stability and recoverability are less important. So, these days I am comfortable with unchecked exceptions BUT the exceptions should be declared on the method signature -- doing so better documents the method.

The value of logging

With the rise of logging software-as-a-service products (SaaS) the monetary cost of logging has increased. If the organization has not been able to recoup some of the previous costs of managing their own logging management in staffing or infrastructure then this cost is a real budget increase. Since the SaaS cost is related to logging volume there are departmental or company mandates to log less. Specifically, only log at the error and warning levels. I think this has been a mistake.

To state the obvious, logs are there to aid problem resolution. (I am not here concerned with APM.) Logs provide the context for the resolution, i.e. data values and time of occurrence. Not all problems are found in the logs; some come from user reports. However, all problem contexts can be found in the logs.

The problems are either consistent or intermittent. Consistent problems occur on every similar user action or API request. Some consistent problems occur on a wider set of user actions or API requests. 

Intermittent problems occur with variability over time or consistently over time. Some intermittent problems occur on a wider set of user actions or API requests. Intermittent problems within the application are usually the result of state change as a secondary activity of the response. Intermittent problems within a distributed architecture are usually due to one or more of the 8 fallacies of distributed computing.

The logging needs for consistent and intermittent problems are different. Logging for consistent problems can often be adequately initiated when returning up the call-chain. That is, an exceptional situation has occurred, and the response is following the error path. Logging for intermittent problems does not have this advantage and so logging must be initiated down the call-chain. 

The context to log is often just the inputs to a method/API and the outputs from a method/API, but only across packages or services. The goal of logging is not to trace the request and response, but to provide enough detail to initiate debugging at more than one point in the request’s response call-chain. 

It follows that logging must include the error messages and the context before (and after) the error. Generally, the purpose of the log levels are:

  • INFO for context – data values and time of occurrence;
  • WARN for nearing design limits (eg, capacity, duration, and absolutes) and so for expected but unwanted response (eg 401 and 5xx HTTP statuses); and
  • ERROR for unexpected responses.

Log messaging must be examined during code reviews as much as the implementation does. Logging can quickly become voluminous as developers tend towards CYA logging. A good senior developer or architect in conjunction with operations and product support can establish rules of thumb for logging that work well with everyone’s needs.

As to the costs of using a logging SaaS, consider not keeping the logs there for very long. (Keep all the logs locally for a long time, however. Local disk and AWS’s S3 are cheap.) Within the SaaS product for

  • older applications that are stable keep all logs for 48 hours;

  • newer applications that are unstable keep all logs for 48 hours; and

  • everything else keep all logs for 2 release or support cycles.

Note that the old vs new application qualifier can also relate to staff experience and longevity. The newer the staff it can take a while to recognize and debug the problem so keep the logs longer.

One last note, I have found it very useful to get a daily report of error and warning messages. Many of the messages are summarized along with an occurrence count. It is your daily health check on the application where you viscerally experience the ebb and flow of the application’s seasonal and instantaneous problems.

There is no "documentation"

There is no "documentation". Instead, there are

Reference: This includes both the public REST API and the library API.

Examples: These are short, heavily annotated, working programs that show how to use aspects of the APIs. They are easier to create than tutorials.

Tutorials: These are stepwise guides to the APIs. These are aimed at developers and testers new to the APIs, or APIs that are difficult to understand.

Operation: These detail the deployment of the product and its supporting tools (monitors, logging, alerts, etc).

Question and Answer Knowledge base: This is an ongoing collection of questions and answers from staff.

What is missing from this list are the aspirational and functional design documents. Both are important at the early stages of development (and, sometimes, for bringing on senior staff) but they represent the plan and not the outcome. Maintaining them, even with "as built" annotations, is rarely done and so they cause confusion instead of aid understanding. Consider them ephemeral.

Few organizations can afford to create and maintain all these kinds of documents. Pick the ones that have vitality in your daily work. For example, if you are hiring or have a less experienced staff then focus on tutorials, examples, and Q&A; if you have a growing customer base then focus on operations and Q&A.

Maybe I was aiming too high ...

 Maybe I was aiming too high ...

"Take the proficiency of fungi at problem-solving. Fungi are used to searching out food by exploring complex three-dimensional environments such as soil, so maybe it’s no surprise that fungal mycelium solves maze puzzles so accurately. It is also very good at finding the most economical route between points of interest. The mycologist Lynne Boddy once made a scale model of Britain out of soil, placing blocks of fungus-colonised wood at the points of the major cities; the blocks were sized proportionately to the places they represented. Mycelial networks quickly grew between the blocks: the web they created reproduced the pattern of the UK’s motorways (‘You could see the M5, M4, M1, M6’)."

Entangled Life: How Fungi Make Our Worlds, Change Our Minds and Shape Our Futures 

Writing at work. Not writing here.

I haven't posted anything here in a long time. At the end of July I started a new job at a big company and have been busy learning how to be effective there. As with many big companies there is not much that I am allowed to share, especially when that company operates in a highly regulated industry. As I settle in and better understand the boundaries I am sure I will start writing here again. Not that anyone really cares; a blog is a vanity project after all.

Small set of guidelines for using Slack

I have been using Slack for a long time. Here are a few guidelines based on that experience:
  • Slack is part of your communications toolkit. It has limits. When you feel that you are typing too much or getting frustrated by misunderstandings then switch to a call. A consequence of this is that you should be prepared to make or accept a call — earbuds at the ready.
  • Slack is great for singular topic discussions. The channel clearly displays the discussion’s history. Even if the discussion spans many days the context is always available for a quick catch up.
  • Slack is terrible for overlapping topic discussions. The channel quickly becomes a hodgepodge of utterances and fragments that have little reference to the topical context. Overlapping topic channels are needed, however. For example, in one small company I worked at the #operations channel is primarily used to give notice of infrastructure changes. When using such a channel you need to remember that others will not have the same topical context as you. Where possible, include some reference to the topic — "Re X, we did Y." For example, "Re loud bang, we have flogged the responsible employee." Here "loud bang" is the strong reminder of the previous topical messages. See Threads. See Links Use pinned messages to define the purpose of the channel along with links to supporting materials. Messages are editable and so do improve and update the pinned message. Pinned messages can be quickly accessed via the "push pin" icon in the channel header.
  • Slack has discussion threads. Any message can be the start of a thread. Threads provide a natural grouping of related messages. The problem with them is that it is hard to have everyone on the channel use them consistently. For example, Jane started a thread to continue the discussion, but Jack, in haste, posted a message instead. Now we have 3 messages connected by two different mechanisms. Don’t use threads unless you can get everyone on the channel to use them consistently. (Actually, just don't use threads.)
  • When you want to clearly reference a previous message use a Slack link. It is long and ugly, but it is by far the most reliable reference.
  • We reuse text from many different applications and written languages everyday. It is common to copy from one place and paste it into Slack, and vise versa. When you do this you bring along a lot of unseen cruft. Cruft like formatting, odd visible characters, odd invisible characters, etc. And Slack does not help, in that it wants to pretty up the text for you; emoji interpretation being the most blatant example. When pasting into Slack always use Paste without formatting or Paste and Match style. And if you are pasting something technical like an email address, an access token, my salary, etc then use Slack's inline code formatting, eg "The email is `me@there.com`" becomes
  • Use the sidebar as a presence indicator. Making sure your immediate team members are favorited.

Macroservices, a middle ground between the monolith and microservices.

There is a middle ground between the monolith and microservices and that is "macroservices." Macroservices are distinguished by having one executable and many compositions. Each composition exercises a portion of the executable's internal components. Some of the components are exercised by all compositions (eg, identity management) and other components by singular composition (eg, a specialized data store). Composition deployments communicate with each other with REST or gRPC. Orchestration for resilience and scaling is accomplished using the same tools as for microservices.

A composition is nothing more than a declaration of a set of components to activate, their interdependencies, and their configurations. Configurations are generally properties, ie name and value pairs, accessed from the environment. Much as when you build the executable and draw all the benefits of type checking, unit testing, static analysis, etc you can build the composition and draw similar correctness assurances.

A macroservice allows your small team of developers to focus on what matters to your business, ie what your customers value. The development infrastructure is simple. The deployment infrastructure is flexible. Troubleshooting is comprehensible.

Written in response to Microservices are for companies with 500+ engineers.

Update: Perhaps a better name is "polyservices".

Update: I am reading Sam Newman's new book Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith. My macroservice is more akin to his distributed monolith except that the services contained within do have stronger data independence than is typical of a monolith.

RI's Innovation Vouchers to help fund product development

I recently learned about RI's Innovation Voucher program. The vouchers are $5,000-$50,000 grants for small companies to buy the expertise needed to develop a new product or process. Vouchers can be redeemed for services at a research institution or to fund an in-house research & development project.

More information at RI Commerce Innovation Incentives.

Lucene, shadow query classes, and the visitor pattern

A good Lucene result is achieved from an index and a query working together.

Defining the schema for the index is mostly an upfront design task: What are the fields? What fields are stored and what are indexed? How are terms in fields parsed? Are found terms supplemented? Are multiple fields combined? Etc.

Once you have made these decisions revisiting them can be prohibitive without planning. Ie, reindexing is found to be too expensive in time or resources, or, worse, the source documents are no longer available. Most Lucene uses error on the side of having a cautious schema where a lot more is stored and indexed than is needed now with the hope of a successful schema refactoring in the future. My experience has been that you don't know enough now about your data, how to index it, how to query it, and how users want to access it to have a viable cautious schema. It is better to store all your sources so you can reindex when you know better how the indexes and queries can cooperate. This is just a cost to using a new technology (and, as always, it can be mitigated).

The upshot in the near term is that the index schema is static. You have to apply flexibility with the query.

As mentioned in a previous posting, there is a tendency to think of Lucene queries like SQL queries. That is, there is a single, correct rendition. Discard that thinking. There are no correct results; there are only better results. To achieve better results you need to watch what your users searches and workout how the queries need to adjust. For example, perhaps you discover that there is a shift in vocabulary happening. What was once "OS X" is now "macOS". When a user queries for OS X you need to also include macOS in the query.

The Lucene API contains a number of query subtypes. These are combined to construct an expression that characterizes the user's search intent. This nested data structure should be considered a starting point. This data structure will be augmented and reformed to reflect your current understanding of how to best use the indexes. In the example above, you want to include macOS when OS X is used.
The Lucene API query subclasses are too rigid for direct augmentation and reforming. In the past the API was downright unbending and so I developed a set of shadow query classes that were amenable for use with the Visitor pattern. For example, this shows two visitors, the first adds the macOS variant and the second converts the query to a Solr expression:

Query q = ... new TermQuery(10.0f, "f", "osx") ...
...
Map<String, List<String>> variants = new HashMap<>();
variants.put("a", Arrays.asList("osx, "macos"));
VariantsQueryVistor vistor = new VariantsQueryVistor(0.0001f, variants);
...
q = vistor.visitQuery(q);
...
s = new SolrLuceneQueryVistor().visitQuery(q).toString();

and s is

( "osx" OR "macos" ^ 0.0001 ) ^ 10.0