Everyone picks on Jira and the criticism is well deserved. Working with it recently I came to see Jira as this
when what us practitioners want is this
Back to coding.
I have been looking for an "external brain" for many years. I was working at Brown University when tools like Intermedia were being developed and my friends were actively discussing and building Ted Nelson's Xanadu. A consequence is that my standard for these tools is very high.
I am always happy to find a tool that satisfies 90% of my needs and offers a plugin API that someone has created a programatic binding for. Prior to the web, in a time when desktop applications ruled, I learned of Tcl. Years later when wiki's were new I wrote plugins for Jspwiki for server side rendering using Tcl and JavaScript. More recently we have seen the rise of programmable notebooks starting with Jupyter, or, perhaps, earlier with Microsoft Word and Google Docs scripting.
These two threads came together recently as I was exploring Obsidian. Specifically, Obsidian has the Dataview plugin that, more or less, treats the Markdown notes as a queryable and navigable repository. I wanted to use Obsidian to help collect my projects under one interface using a loose GTD approach. Each project is a note and that note lists the project's next action and what it is waiting on as tasks. And there would be a "dashboard" note that automatically enumerates all next actions and waiting ons from all projects.
There are lots of ways of handing this in Obsidian and its plugins -- especially using the Checklist plugin. I think Nicole van der Hoeven's Actually getting things done with Obsidian // Checklist plugin is one of the best. However, I did not like how it was forcing an unnatural encoding and display of next actions and waiting on. Since I am in the exploration phase of learning Obsidian I let my perfectionism override my pragmatism.
A result of the exploration was to use Dataview to achieve my ends. I wanted to encode my project like the following
Note the annotation on the next action and waiting on tasks. The dashboard should look like
The key feature for this to work is the Dataview annotations it adds to the Obsidian tasks. The annotations are [next-action::] and [waiting-on::]. For the dashboard I can then use the annotations with a Dataview JavaScript code block to select the next actions and waiting ons across projects. Here is the GTD dashboard note
## Next Actions ```dataviewjs let tasks = dv .pages('"projects"') .sort((a,b) => dv.compare(a.file.name, b.file.name)) .file .tasks .filter(t => t.annotated && t.hasOwnProperty("next-action")); if(tasks.length) { dv.taskList(tasks); } else { dv.paragraph("None") } ``` ## Waiting On ```dataviewjs let tasks = dv .pages('"projects"') .sort((a,b) => dv.compare(a.file.name, b.file.name)) .file .tasks .filter( t => t.annotated && t.hasOwnProperty("waiting-on")); if(tasks.length) { dv.taskList(tasks); } else { dv.paragraph("None"); } ``` ## Projects ```dataviewjs dv .pages('"projects"') .sort((a,b)=>dv.compare(a.file.name,b.file.name)) .forEach(p=>dv.paragraph(p.file.link)) ``` END
The end result is
The result is not exactly what I want. I don't want the annotations and the links to be display. I have not figures out how to eliminate them yet. It is a good start and I did learn much about Dataview and Obsidian. (Oh, the next step to enhance Dataview or write my own plugin. Maybe not.)
A few weeks ago a colleague asked about checked and unchecked exceptions and I mentioned offhand that it is useful to understand Spring's exception design and choices. This is a better response...
The Spring exception library has been around a long time and it has survived because it matches the semantics of servicing problems rather than categorizing technical failings. In particular, I am addressing the org.springframework.dao.DataAccessException hierarchy of exceptions. It is worth the time to read Chapter 9 of Expert One-On-One J2EE Design and Development to better understand Spring's exceptions
The first question we need to ask is why do we use exceptions? For me an exception is due to an unanticipated or unforeseen problem that MUST be handled outside of the normal call chain. If we have a method that is expected to return a value and it can't then this is an exception. If we have a method that can be expected to not return a value then that is not an exception. For example, if the method "int getFoo(int bar)" is expected to have a valid return value for every value of bar then any problems must raise an exception. However, if the method does not have a valid return value for every value of bar then the method is badly specified. The method would be better specified as "Optional<Integer> getFoo(int bar)" or, better yet, named "findFoo". Once you have a well specified method then you can consider how to use exceptions.
What I like about Spring's data access exceptions is that they derive from three base classes RecoverableDataAccessException, NonTransientDataAccessException, and TransientDataAccessException. These base classes let the caller know how to respond to the exception -- and this is important -- if the caller wants to. For example, a method raising NonTransientDataAccessException (or its subclasses) can't be "retried" to get a different result. Whereas, a method raising TransientDataAccessException could be retried, and a method raising RecoverableDataAccessException could be retried once some mitigation has been undertaken. For the example, "int getFoo(int)" could throw a NonTransientDataAccessException (well, a subclass like NotFoundException) if the given "bar" does not have a corresponding "foo".
You can also see how we could have a similar set of base exceptions for process failures, eg RecoverableProcessException, NonTransientProcessingException, and TransientProcessingException.
As to whether to use checked exceptions or not, I think there are two factors to consider. The first factor is how likely can intermediaries in the call chain practically respond to the exception? The second consideration is how important is it for the caller to know about the exceptions thrown by the method? I think understanding how to respond to exceptions is critical to building stable, recoverable applications. However, in a K8 world where failed applications have a small functional scope and can be restarted automatically stability and recoverability are less important. So, these days I am comfortable with unchecked exceptions BUT the exceptions should be declared on the method signature -- doing so better documents the method.
With the rise of logging software-as-a-service products (SaaS) the monetary cost of logging has increased. If the organization has not been able to recoup some of the previous costs of managing their own logging management in staffing or infrastructure then this cost is a real budget increase. Since the SaaS cost is related to logging volume there are departmental or company mandates to log less. Specifically, only log at the error and warning levels. I think this has been a mistake.
To state the obvious, logs are there to aid problem resolution. (I am not here concerned with APM.) Logs provide the context for the resolution, i.e. data values and time of occurrence. Not all problems are found in the logs; some come from user reports. However, all problem contexts can be found in the logs.
The problems are either consistent or intermittent. Consistent problems occur on every similar user action or API request. Some consistent problems occur on a wider set of user actions or API requests.
Intermittent problems occur with variability over time or consistently over time. Some intermittent problems occur on a wider set of user actions or API requests. Intermittent problems within the application are usually the result of state change as a secondary activity of the response. Intermittent problems within a distributed architecture are usually due to one or more of the 8 fallacies of distributed computing.
The logging needs for consistent and intermittent problems are different. Logging for consistent problems can often be adequately initiated when returning up the call-chain. That is, an exceptional situation has occurred, and the response is following the error path. Logging for intermittent problems does not have this advantage and so logging must be initiated down the call-chain.
The context to log is often just the inputs to a method/API and the outputs from a method/API, but only across packages or services. The goal of logging is not to trace the request and response, but to provide enough detail to initiate debugging at more than one point in the request’s response call-chain.
It follows that logging must include the error messages and the context before (and after) the error. Generally, the purpose of the log levels are:
Log messaging must be examined during code reviews as much as the implementation does. Logging can quickly become voluminous as developers tend towards CYA logging. A good senior developer or architect in conjunction with operations and product support can establish rules of thumb for logging that work well with everyone’s needs.
As to the costs of using a logging SaaS, consider not keeping the logs there for very long. (Keep all the logs locally for a long time, however. Local disk and AWS’s S3 are cheap.) Within the SaaS product for
older applications that are stable keep all logs for 48 hours;
newer applications that are unstable keep all logs for 48 hours; and
everything else keep all logs for 2 release or support cycles.
Note that the old vs new application qualifier can also relate to staff experience and longevity. The newer the staff it can take a while to recognize and debug the problem so keep the logs longer.
One last note, I have found it very useful to get a daily report of error and warning messages. Many of the messages are summarized along with an occurrence count. It is your daily health check on the application where you viscerally experience the ebb and flow of the application’s seasonal and instantaneous problems.
There is no "documentation". Instead, there are
Reference: This includes both the public REST API and the library API.
Examples: These are short, heavily annotated, working programs that show how to use aspects of the APIs. They are easier to create than tutorials.
Tutorials: These are stepwise guides to the APIs. These are aimed at developers and testers new to the APIs, or APIs that are difficult to understand.
Operation: These detail the deployment of the product and its supporting tools (monitors, logging, alerts, etc).
Question and Answer Knowledge base: This is an ongoing collection of questions and answers from staff.
What is missing from this list are the aspirational and functional design documents. Both are important at the early stages of development (and, sometimes, for bringing on senior staff) but they represent the plan and not the outcome. Maintaining them, even with "as built" annotations, is rarely done and so they cause confusion instead of aid understanding. Consider them ephemeral.
Few organizations can afford to create and maintain all these kinds of documents. Pick the ones that have vitality in your daily work. For example, if you are hiring or have a less experienced staff then focus on tutorials, examples, and Q&A; if you have a growing customer base then focus on operations and Q&A.
Maybe I was aiming too high ...
"Take the proficiency of fungi at problem-solving. Fungi are used to searching out food by exploring complex three-dimensional environments such as soil, so maybe it’s no surprise that fungal mycelium solves maze puzzles so accurately. It is also very good at finding the most economical route between points of interest. The mycologist Lynne Boddy once made a scale model of Britain out of soil, placing blocks of fungus-colonised wood at the points of the major cities; the blocks were sized proportionately to the places they represented. Mycelial networks quickly grew between the blocks: the web they created reproduced the pattern of the UK’s motorways (‘You could see the M5, M4, M1, M6’)."
Entangled Life: How Fungi Make Our Worlds, Change Our Minds and Shape Our Futures
I haven't posted anything here in a long time. At the end of July I started a new job at a big company and have been busy learning how to be effective there. As with many big companies there is not much that I am allowed to share, especially when that company operates in a highly regulated industry. As I settle in and better understand the boundaries I am sure I will start writing here again. Not that anyone really cares; a blog is a vanity project after all.
I recently learned about RI's Innovation Voucher program. The vouchers are $5,000-$50,000 grants for small companies to buy the expertise needed to develop a new product or process. Vouchers can be redeemed for services at a research institution or to fund an in-house research & development project.
More information at RI Commerce Innovation Incentives.
Query q = ... new TermQuery(10.0f, "f", "osx") ... ... Map<String, List<String>> variants = new HashMap<>(); variants.put("a", Arrays.asList("osx, "macos")); VariantsQueryVistor vistor = new VariantsQueryVistor(0.0001f, variants); ... q = vistor.visitQuery(q); ... s = new SolrLuceneQueryVistor().visitQuery(q).toString();and s is
( "osx" OR "macos" ^ 0.0001 ) ^ 10.0
I don't like Git and this posting is not going to change anyone's mind about Git and I will continue to use Git as it has become the one version control system to rule them all.
Git does not work the way I want a version control tool to work and especially in a busy development shop. I work on multiple branches concurrently. Does anyone have serialized bug and feature work? I don't need the whole version control tree available all the time. I only need the trunk and the branches I am working on and I need them to be available simultaneously. Git's stash, which allows pausing development on one branch to work on another, is useless to me. Using it breaks my work flow and destabilizes my mental model of my file system. That is, when I am in ~/src/foo/branches/issue1/
I know everything below is work being done on issue1, and when I am in ~/src/foo/branches/feature2/
I know everything below is work being done on feature2. There is never any confusion as to what I am looking at. With Git, however, I need to frequently confirm that ~/src/foo/
is currently checked-out for issue1 or feature2. (Cloning to specific issue1 and feature2 directories is not a solution. See following.)
The other issue with Git is with centralized repos. Git's development model assumes many developers, each with their own full copy of the repo, and exchanging updates via patches. I understand this model and see its value. Introducing a centralized repo into this model adds complexity and collateral problems. It is not that the problems are unique to Git, but that a centralized Git repo intensives them. For example, developers A and B work on the same branch and each performs one or more checkins and some number of pushes to the origin (ie, the centralized repo). Since we have 3 repos in play -- A's repo, Bs repo, and the centralized repo -- the chances of a conflict after checkin are very high. Compare this with Subversion where there is only one repo. A and B can work as much as they like and will be stopped to resolve conflicts at checkin and not afterwards. I argue that resolving conflicts at checkin is a far better time to do this work as one's mind is in "differences" mode. When using Git I find myself performing a checkin immediately followed by a push so as to avoid this mess. Doing so loses a key feature of Git, ie distributed version control that works offline.
Update: When I wrote the following I was not aware of interactive rebase. An interactive rebase is used to reorder and combine commits so as to provide a list of changes where each change is functionally complete and the list is in a logical order. The video Interactive git rebase for code reviews and profit is a useful guide to doing that.
I hate Git's rebasing. File A is branched from master. A is changed N times on the branch and a few times on the master. Rebasing A is the equivalent of branching A from the current master and automatically applying the N changes. Doing this ruins the version history and reviewing the version history is the first step in fixing regressions. Other version history fraud comes from deleting branches and so loosing detail about who and when a change was introduced to the branch which are details that are often critical to recontextualizing the "fix". Git can be used without recourse to rebasing and deleting branches, but it seems to be common practice.
Back to coding...
Lucene, Solr, and Elasticsearch have a powerful query language and a convenient textual representation. I have seen some API providers allow users to directly use this representation within their search API. Unfortunately, when you do so you loose significant control over your development and operations. You have exposed how you process your source data for full-text searching and so can no longer make behind-the-scenes schema changes. All your queryable data must be indexed in Lucene even if doing so is questionable and/or searchable elsewhere. You are locked into using Lucene to execute the query and at scale this can be exorbitantly expensive. To fix any of these issues you end up having to break backwards compatibility. How often will your API users accept this?
Instead, design your own query language. You can base it on Lucene's syntax if you like or an S-expression as it is trivial to parse in any programming language. No matter how you express the syntax, however, you have control over the semantics and the execution. For example, perhaps one of the searchable fields is not in your full-text index, but in a relational database and its presence in the query signals a programmatic join of the indexed and relational results. You can do that. Even if your queries are easily handled by Lucene at scale you are still better off translating yours syntax, like this old-school Google query
-a +b c d
to the Lucene equivalent
not a and ( b and (c d) )
because you remain in control.
Update: I fixed the term transposition in the Lucene equivalent of the Google expression. Doh!
Lucene has a powerful query language. The same language is used by Solr and Elasticsearch. I have found that users often don't utilize it well as they mistakenly apply their SQL experiences to it. With SQL your queries return exact results. Nothing in the result set is irrelevant. SQL query performance rewards tight queries consisting of few terms, few indexes, and few joins. With Lucene yours queries return ranked results. Much in the result set is actually peripheral. In fact, unless you limit the result set Lucene will return all the documents. A Lucene query is about getting a good ranking of results rather than exact results. To this end your query and indexes need to work together to achieve this. In general this means you need to make good use of broadening and narrowing terms, and boosting matches.
For example, if your query simply looked for the term "mouse" and you indexed your documents verbatim you should not expect to find any "mice". (Recall that Lucene sees your words as numbers and so "mouse" might be #23 and "mice" might be #6078.) It is therefore better to search for
mouse mice
When searching for "mouse mice" your results will be ordered so that either term gives equal weight to the document's rank. This is unlikely the correct course. The query was for "mouse" and you broadened it to include "mice". Documents matching "mouse" should be ranked higher than documents matching "mice". In Lucene's query language you do this by boosting the weight of terms. Documents matching "mouse" should be boosted orders of magnitude higher than "mice", ie
mouse^1000 mice
You will often see small boost values in other peoples examples. My experience has been that small boosts do not adequately differentiate documents. Big boosts do.
You likely noticed that my query was for "mouse mice" and not "mouse or mice". With Lucene, as with SQL, a boolean "or" kills performance. By not using "or" in your Lucene query you are allowing it to rank higher documents that contain both terms over documents that contain only one of them. Since the higher documents do contain the wanted term, "mouse", I don't see a need to repress ranking them equally to documents containing only "mouse" (ie, no use of "mice"). The performance cost is usually not worth it, especially when your queriers will be more complex than this simple example.
The budgeting phrase "structural deficit" is a euphemism for spending more than you earn or spending less than necessary. That is, avoiding spending on maintenance or other costs that can be deferred or refinanced. The South Kingstown School District (SD) has clearly been running structural deficits. Implementing programs it can not afford. Delaying maintenance until unavoidable. And that this has been going on for years shows that the School Committee has failed us and the Town Council has inadequately performed their budgeting oversight.
The SD has a long history of providing impenetrable budget documents. Often the documents consist of a few printed spreadsheets and a mountain of slide decks. Both of which are incomprehensible without the oral narrative given at workshop meetings. Further, when you ask for the actual data you get files of account and values without any description of structure or purpose. This situation is unacceptable.
Not until the SD can provide an intelligible budget document that can be understood by us at our kitchen table should they be authorized to bond for any monies. Creating such a document is not a light undertaking, but luckily the SD can follow the exemplars found at Association of School Business Officials International (ASBO) and the Government Finance Officers Association (GFOA).
If you run a small data center and have capped bandwidth you don't want to be delivering bulk data to customers. It is better to place the data in the cloud and redirect your customers to get the data there. Amazon's S3 is a good place for that as creating a public URL is trivial. If the data is not public then S3 has a simple mechanism for enabling you to authenticate access. To do this you run your own authentication service; this service prepares a signed, time limited URL that you give to the client to use to download the data from S3. The network interaction is all done within SSL and so you don't need to worry about the URL escaping into the wild and even if it did the loss is time limited.
The AWS S3 service calls this a non-anonymous request URL. For example, if your data is in the "2019-Q4.tsv" item in the "com.andrewgilmartin.bucket1" bucket the URL is
https://s3.amazonaws.com/com.andrewgilmartin.bucket1/2019-Q4.tsv
Your authentication service will (after authenticating the user) redirect the user's HTTP client to the URL
https://s3.amazonaws.com/com.andrewgilmartin.bucket1/2019-Q4.tsv ?AWSAccessKeyId=<<AWS_ACCESS_KEY>> &Expires=<<EXPIRES>> &Signature=<<SIGNATURE>>
This is the non-anonymous request URL. The <<SIGNATURE>> is a base64 encoding of an SHA1 encryption of the HTTP method ("GET"), the path ("/com.andrewgilmartin.bucket1/2019-Q4.tsv"), and the expiration time (<<EXPIRES>>). The <<AWS_ACCESS_KEY>> corresponding secret key is used for the encryption. An example Java implementation is at S3RestAuthenticationUrlFactory.
For any of this to work you will need an AWS access key id and secret key that is associated with an IAM user with a policy to access the S3 bucket. If you have not done this before the video AWS S3 Bucket Security, Restrict Privileges to User using IAM Policy is a good tutorial. If you only want to allow read access then remove the "s3:PutObject" and "s3:DeleteObject" actions from the example policy.
This posting continues the series on moving from an Ant to a Maven build.
The last stage is to actually move the Ant build to Maven. Your source tree is now quite spartan. It contains the web application, lots of configuration files, servlets or controllers, and non-core supporting classes. As before, you will create a new Maven project, establish dependencies, copy your files, and build and test until complete.
This project is a combination of webapp and Java but Maven does have an automated way of creating this. Instead, you need to first create the webapp project and then create the java tree. Create the webapp project
mvn archetype:generate \ -DarchetypeGroupId=org.apache.maven.archetypes \ -DarchetypeArtifactId=maven-archetype-webapp \ -DarchetypeVersion=1.4 \ -DinteractiveMode=false \ -DgroupId=com.andrewgilmartin \ -DartifactId=system-application \ -Dversion=1.0-SNAPSHOT
Now create the Java tree
cd system-application mkdir -p \ src/main/java \ src/main/resources \ src/test/java \ src/test/resources
The result is
. ├── pom.xml └── src ├── main │ ├── java │ ├── resources │ └── webapp │ ├── WEB-INF │ │ └── web.xml │ └── index.jsp └── test ├── java └── resources
The pom.xml file is little different from those created before. The significant change is the <packaging/> element
<packaging>war</packaging>
The "war" value directs Maven to create the war instead of a jar (the default). Now add to the pom.xml the common and system-core dependencies, and any other dependencies specific to the application.
Your web application runs within a servlet container and that container provides some of your dependencies. You need these dependencies for compilation, but they should not be bundled into your war. Maven calls these "provided" dependencies. For these dependencies add a <scope/> element to your <dependency/> element, eg
<dependency> <groupId>org.apache.tomcat</groupId> <artifactId>tomcat-servlet-api</artifactId> <version>8.0.15</version> <scope>provided</scope> </dependency>
Copy the application's code, configuration, and webapp from the system source tree to this project. Build and test as normal until you have clean results.
If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.
This posting continues the series on moving from an Ant to a Maven build.
With your common packages now having their own Maven build you can move on to the system itself. For this series I am assuming that your system is composed of a web application with several command line tools. The web application is likely a large set of servlets or Spring controllers. It's a monolith and it is going to stay that way for the near future. The command lines tools are used for nightly batch operations or ad hoc reports, etc. What they have in common is that they require some of the system's packages to function. Eg, they depend on its data access packages, protocol facilitation packages, billing logic packages, etc. The next stage is to separate the system's core code, the command line tools, and the application code and its configuration.
Create a new Maven project for the system core code
mvn archetype:generate \ -DgroupId=com.andrewgilmartin \ -DartifactId=system-core \ -DarchetypeArtifactId=maven-archetype-quickstart \ -DarchetypeVersion=1.4 \ -DinteractiveMode=false
Replace the groupId and artifactId as appropriate.
Copy all the system core code to this project much like you did when extracting the common code. You will likely again find that the core code has entanglements with non-core code that you are going to have to work out. That can be very difficult and require some refactoring; hopefully not significant enough to abandon the whole effort.
As you are assembling the system-core project you may discover that it tries to come to life. You have the Java equivalent of archaea and bacteria, ie a self configuring class or sets of classes. These are classes with static blocks, eg
public class Archaea { static { /* do some configuration */ } }
That static block is executed as the class is used. Normally this has not been an issue as the classes were always used in the context of the whole system. Now they are isolated. If they depended on external resources or files that are no longer available then their initialization failures leave them in undefined states. You will need to work this out. Can the static block be eliminated or replaced with initialization upon first instance use? Maybe a Design Patterns refactoring is needed.
Build and test as normal until you have clean results.
Once your system-core project is complete remove its code from the system's source tree, remove unneeded dependencies from the Ant build.xml, and add the new dependency to the <mvn-dependencies/> element in build.xml. Build and test the system as normal until you have clean results.
Now extract the command line tools from the system into their own Maven projects. These projects will depend on the system-common and system-core projects. The Maven build will also need to create an "uberjar", that is a single jar that bundles all the classes and jars needed to run the tool.
Pick a command line tool and create a new Maven project for it as you would normally. Eg, for the gizmo command line tool use
mvn archetype:generate \ -DgroupId=com.andrewgilmartin \ -DartifactId=gizmo \ -DarchetypeArtifactId=maven-archetype-quickstart \ -DarchetypeVersion=1.4 \ -DinteractiveMode=false
Replace the groupId and artifactId as appropriate. Add to the pom.xml the system-common and system-core dependencies, and any other dependencies specific to the tool. Copy the tool's code from the system source tree to this project. Build and test as normal until you have clean results.
To create the "uberjar" update pom.xml and replace the whole <plugins/> with
<plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>3.1.0</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <archive> <manifest> <addClasspath>true</addClasspath> <mainClass>com.andrewgilmartin.gizmo.App</mainClass> </manifest> </archive> </configuration> <executions> <execution> <id>assemble-all</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins>
Replace "com.andrewgilmartin.gizmo.App" with the fully qualified class name of the tool. When you now build the Maven project you will see "maven-assembly-plugin" log
--- maven-assembly-plugin:3.1.0:single (assemble-all) @ gizmo --- Building jar: /home/ajg/src/gizmo/target/gizmo-1.0-SNAPSHOT-jar-with-dependencies.jar
The file "gizmo-1.0-SNAPSHOT-jar-with-dependencies.jar" is the uberjar. To trial run your command line tool use
java -jar target/gizmo-1.0-SNAPSHOT-jar-with-dependencies.jar
Don't forget to add whatever command line options prevent the tool from doing any actual work!
Once your tool is complete remove its code from the system's source tree and remove unneeded dependencies from the Ant build.xml.
Continue this procedure for each of your command line tools.
At this point you have
The remaining system is just the web application with its configuration, servlets or controllers, and the odd ball classes that don't fit in system-common or system-core. The next stage is to refactor the system Ant project itself.