Small set of guidelines for using Slack

I have been using Slack for a long time. Here are a few guidelines based on that experience:
  • Slack is part of your communications toolkit. It has limits. When you feel that you are typing too much or getting frustrated by misunderstandings then switch to a call. A consequence of this is that you should be prepared to make or accept a call — earbuds at the ready.
  • Slack is great for singular topic discussions. The channel clearly displays the discussion’s history. Even if the discussion spans many days the context is always available for a quick catch up.
  • Slack is terrible for overlapping topic discussions. The channel quickly becomes a hodgepodge of utterances and fragments that have little reference to the topical context. Overlapping topic channels are needed, however. For example, in one small company I worked at the #operations channel is primarily used to give notice of infrastructure changes. When using such a channel you need to remember that others will not have the same topical context as you. Where possible, include some reference to the topic — "Re X, we did Y." For example, "Re loud bang, we have flogged the responsible employee." Here "loud bang" is the strong reminder of the previous topical messages. See Threads. See Links Use pinned messages to define the purpose of the channel along with links to supporting materials. Messages are editable and so do improve and update the pinned message. Pinned messages can be quickly accessed via the "push pin" icon in the channel header.
  • Slack has discussion threads. Any message can be the start of a thread. Threads provide a natural grouping of related messages. The problem with them is that it is hard to have everyone on the channel use them consistently. For example, Jane started a thread to continue the discussion, but Jack, in haste, posted a message instead. Now we have 3 messages connected by two different mechanisms. Don’t use threads unless you can get everyone on the channel to use them consistently. (Actually, just don't use threads.)
  • When you want to clearly reference a previous message use a Slack link. It is long and ugly, but it is by far the most reliable reference.
  • We reuse text from many different applications and written languages everyday. It is common to copy from one place and paste it into Slack, and vise versa. When you do this you bring along a lot of unseen cruft. Cruft like formatting, odd visible characters, odd invisible characters, etc. And Slack does not help, in that it wants to pretty up the text for you; emoji interpretation being the most blatant example. When pasting into Slack always use Paste without formatting or Paste and Match style. And if you are pasting something technical like an email address, an access token, my salary, etc then use Slack's inline code formatting, eg "The email is `me@there.com`" becomes
  • Use the sidebar as a presence indicator. Making sure your immediate team members are favorited.

Macroservices, a middle ground between the monolith and microservices.

There is a middle ground between the monolith and microservices and that is "macroservices." Macroservices are distinguished by having one executable and many compositions. Each composition exercises a portion of the executable's internal components. Some of the components are exercised by all compositions (eg, identity management) and other components by singular composition (eg, a specialized data store). Composition deployments communicate with each other with REST or gRPC. Orchestration for resilience and scaling is accomplished using the same tools as for microservices.

A composition is nothing more than a declaration of a set of components to activate, their interdependencies, and their configurations. Configurations are generally properties, ie name and value pairs, accessed from the environment. Much as when you build the executable and draw all the benefits of type checking, unit testing, static analysis, etc you can build the composition and draw similar correctness assurances.

A macroservice allows your small team of developers to focus on what matters to your business, ie what your customers value. The development infrastructure is simple. The deployment infrastructure is flexible. Troubleshooting is comprehensible.

Written in response to Microservices are for companies with 500+ engineers.

Update: Perhaps a better name is "polyservices".

Update: I am reading Sam Newman's new book Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith. My macroservice is more akin to his distributed monolith except that the services contained within do have stronger data independence than is typical of a monolith.

RI's Innovation Vouchers to help fund product development

I recently learned about RI's Innovation Voucher program. The vouchers are $5,000-$50,000 grants for small companies to buy the expertise needed to develop a new product or process. Vouchers can be redeemed for services at a research institution or to fund an in-house research & development project.

More information at RI Commerce Innovation Incentives.

The Trojan Dragon?



Found at Old Book Illustrations site.

Lucene, shadow query classes, and the visitor pattern

A good Lucene result is achieved from an index and a query working together.

Defining the schema for the index is mostly an upfront design task: What are the fields? What fields are stored and what are indexed? How are terms in fields parsed? Are found terms supplemented? Are multiple fields combined? Etc.

Once you have made these decisions revisiting them can be prohibitive without planning. Ie, reindexing is found to be too expensive in time or resources, or, worse, the source documents are no longer available. Most Lucene uses error on the side of having a cautious schema where a lot more is stored and indexed than is needed now with the hope of a successful schema refactoring in the future. My experience has been that you don't know enough now about your data, how to index it, how to query it, and how users want to access it to have a viable cautious schema. It is better to store all your sources so you can reindex when you know better how the indexes and queries can cooperate. This is just a cost to using a new technology (and, as always, it can be mitigated).

The upshot in the near term is that the index schema is static. You have to apply flexibility with the query.

As mentioned in a previous posting, there is a tendency to think of Lucene queries like SQL queries. That is, there is a single, correct rendition. Discard that thinking. There are no correct results; there are only better results. To achieve better results you need to watch what your users searches and workout how the queries need to adjust. For example, perhaps you discover that there is a shift in vocabulary happening. What was once "OS X" is now "macOS". When a user queries for OS X you need to also include macOS in the query.

The Lucene API contains a number of query subtypes. These are combined to construct an expression that characterizes the user's search intent. This nested data structure should be considered a starting point. This data structure will be augmented and reformed to reflect your current understanding of how to best use the indexes. In the example above, you want to include macOS when OS X is used.
The Lucene API query subclasses are too rigid for direct augmentation and reforming. In the past the API was downright unbending and so I developed a set of shadow query classes that were amenable for use with the Visitor pattern. For example, this shows two visitors, the first adds the macOS variant and the second converts the query to a Solr expression:

Query q = ... new TermQuery(10.0f, "f", "osx") ...
...
Map<String, List<String>> variants = new HashMap<>();
variants.put("a", Arrays.asList("osx, "macos"));
VariantsQueryVistor vistor = new VariantsQueryVistor(0.0001f, variants);
...
q = vistor.visitQuery(q);
...
s = new SolrLuceneQueryVistor().visitQuery(q).toString();

and s is

( "osx" OR "macos" ^ 0.0001 ) ^ 10.0

Git and the busy development shop

I don't like Git and this posting is not going to change anyone's mind about Git and I will continue to use Git as it has become the one version control system to rule them all.

Git does not work the way I want a version control tool to work and especially in a busy development shop. I work on multiple branches concurrently. Does anyone have serialized bug and feature work? I don't need the whole version control tree available all the time. I only need the trunk and the branches I am working on and I need them to be available simultaneously. Git's stash, which allows pausing development on one branch to work on another, is useless to me. Using it breaks my work flow and destabilizes my mental model of my file system. That is, when I am in ~/src/foo/branches/issue1/ I know everything below is work being done on issue1, and when I am in ~/src/foo/branches/feature2/ I know everything below is work being done on feature2. There is never any confusion as to what I am looking at. With Git, however, I need to frequently confirm that ~/src/foo/ is currently checked-out for issue1 or feature2. (Cloning to specific issue1 and feature2 directories is not a solution. See following.)

The other issue with Git is with centralized repos. Git's development model assumes many developers, each with their own full copy of the repo, and exchanging updates via patches. I understand this model and see its value. Introducing a centralized repo into this model adds complexity and collateral problems. It is not that the problems are unique to Git, but that a centralized Git repo intensives them. For example, developers A and B work on the same branch and each performs one or more checkins and some number of pushes to the origin (ie, the centralized repo). Since we have 3 repos in play -- A's repo, Bs repo, and the centralized repo -- the chances of a conflict after checkin are very high. Compare this with Subversion where there is only one repo. A and B can work as much as they like and will be stopped to resolve conflicts at checkin and not afterwards. I argue that resolving conflicts at checkin is a far better time to do this work as one's mind is in "differences" mode. When using Git I find myself performing a checkin immediately followed by a push so as to avoid this mess. Doing so loses a key feature of Git, ie distributed version control that works offline.

I hate Git's rebasing. File A is branched from master. A is changed N times on the branch and a few times on the master. Rebasing A is the equivalent of branching A from the current master and automatically applying the N changes. Doing this ruins the version history and reviewing the version history is the first step in fixing regressions. Other version history fraud comes from deleting branches and so loosing detail about who and when a change was introduced to the branch which are details that are often critical to recontextualizing the "fix". Git can be used without recourse to rebasing and deleting branches, but it seems to be common practice.

Back to coding...

Remain in control of your search

Lucene, Solr, and Elasticsearch have a powerful query language and a convenient textual representation. I have seen some API providers allow users to directly use this representation within their search API. Unfortunately, when you do so you loose significant control over your development and operations. You have exposed how you process your source data for full-text searching and so can no longer make behind-the-scenes schema changes. All your queryable data must be indexed in Lucene even if doing so is questionable and/or searchable elsewhere. You are locked into using Lucene to execute the query and at scale this can be exorbitantly expensive. To fix any of these issues you end up having to break backwards compatibility. How often will your API users accept this?

Instead, design your own query language. You can base it on Lucene's syntax if you like or an S-expression as it is trivial to parse in any programming language. No matter how you express the syntax, however, you have control over the semantics and the execution. For example, perhaps one of the searchable fields is not in your full-text index, but in a relational database and its presence in the query signals a programmatic join of the indexed and relational results. You can do that. Even if your queries are easily handled by Lucene at scale you are still better off translating yours syntax, like this old-school Google query

 -a +b c d

to the Lucene equivalent

 not a and ( b and (c d) )

because you remain in control.

Update: I fixed the term transposition in the Lucene equivalent of the Google expression. Doh!

Lucene boosting

Lucene has a powerful query language. The same language is used by Solr and Elasticsearch. I have found that users often don't utilize it well as they mistakenly apply their SQL experiences to it. With SQL your queries return exact results. Nothing in the result set is irrelevant. SQL query performance rewards tight queries consisting of few terms, few indexes, and few joins. With Lucene yours queries return ranked results. Much in the result set is actually peripheral. In fact, unless you limit the result set Lucene will return all the documents. A Lucene query is about getting a good ranking of results rather than exact results. To this end your query and indexes need to work together to achieve this. In general this means you need to make good use of broadening and narrowing terms, and boosting matches.

For example, if your query simply looked for the term "mouse" and you indexed your documents verbatim you should not expect to find any "mice". (Recall that Lucene sees your words as numbers and so "mouse" might be #23 and "mice" might be #6078.) It is therefore better to search for

mouse mice

When searching for "mouse mice" your results will be ordered so that either term gives equal weight to the document's rank. This is unlikely the correct course. The query was for "mouse" and you broadened it to include "mice". Documents matching "mouse" should be ranked higher than documents matching "mice". In Lucene's query language you do this by boosting the weight of terms. Documents matching "mouse" should be boosted orders of magnitude higher than "mice", ie

mouse^1000 mice

You will often see small boost values in other peoples examples. My experience has been that small boosts do not adequately differentiate documents. Big boosts do.

You likely noticed that my query was for "mouse mice" and not "mouse or mice". With Lucene, as with SQL, a boolean "or" kills performance. By not using "or" in your Lucene query you are allowing it to rank higher documents that contain both terms over documents that contain only one of them. Since the higher documents do contain the wanted term, "mouse", I don't see a need to repress ranking them equally to documents containing only "mouse" (ie, no use of "mice"). The performance cost is usually not worth it, especially when your queriers will be more complex than this simple example.

South Kingstown's impenetrable budget documents

The budgeting phrase "structural deficit" is a euphemism for spending more than you earn or spending less than necessary. That is, avoiding spending on maintenance or other costs that can be deferred or refinanced. The South Kingstown School District (SD) has clearly been running structural deficits. Implementing programs it can not afford. Delaying maintenance until unavoidable. And that this has been going on for years shows that the School Committee has failed us and the Town Council has inadequately performed their budgeting oversight.

The SD has a long history of providing impenetrable budget documents. Often the documents consist of a few printed spreadsheets and a mountain of slide decks. Both of which are incomprehensible without the oral narrative given at workshop meetings. Further, when you ask for the actual data you get files of account and values without any description of structure or purpose. This situation is unacceptable.

Not until the SD can provide an intelligible budget document that can be understood by us at our kitchen table should they be authorized to bond for any monies. Creating such a document is not a light undertaking, but luckily the SD can follow the exemplars found at Association of School Business Officials International (ASBO) and the Government Finance Officers Association (GFOA).

Authenticating S3 access using non-anonymous request URLs

If you run a small data center and have capped bandwidth you don't want to be delivering bulk data to customers. It is better to place the data in the cloud and redirect your customers to get the data there. Amazon's S3 is a good place for that as creating a public URL is trivial. If the data is not public then S3 has a simple mechanism for enabling you to authenticate access. To do this you run your own authentication service; this service prepares a signed, time limited URL that you give to the client to use to download the data from S3. The network interaction is all done within SSL and so you don't need to worry about the URL escaping into the wild and even if it did the loss is time limited.

The AWS S3 service calls this a non-anonymous request URL. For example, if your data is in the "2019-Q4.tsv" item in the "com.andrewgilmartin.bucket1" bucket the URL is

https://s3.amazonaws.com/com.andrewgilmartin.bucket1/2019-Q4.tsv

Your authentication service will (after authenticating the user) redirect the user's HTTP client to the URL

https://s3.amazonaws.com/com.andrewgilmartin.bucket1/2019-Q4.tsv
    ?AWSAccessKeyId=<<AWS_ACCESS_KEY>>
    &Expires=<<EXPIRES>>
    &Signature=<<SIGNATURE>>

This is the non-anonymous request URL. The <<SIGNATURE>> is a base64 encoding of an SHA1 encryption of the HTTP method ("GET"), the path ("/com.andrewgilmartin.bucket1/2019-Q4.tsv"), and the expiration time (<<EXPIRES>>). The <<AWS_ACCESS_KEY>> corresponding secret key is used for the encryption. An example Java implementation is at S3RestAuthenticationUrlFactory.

For any of this to work you will need an AWS access key id and secret key that is associated with an IAM user with a policy to access the S3 bucket. If you have not done this before the video AWS S3 Bucket Security, Restrict Privileges to User using IAM Policy is a good tutorial. If you only want to allow read access then remove the "s3:PutObject" and "s3:DeleteObject" actions from the example policy.

Creating a Maven project for your web application

This posting continues the series on moving from an Ant to a Maven build. If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.

The last stage is to actually move the Ant build to Maven. Your source tree is now quite spartan. It contains the web application, lots of configuration files, servlets or controllers, and non-core supporting classes. As before, you will create a new Maven project, establish dependencies, copy your files, and build and test until complete.

This project is a combination of webapp and Java but Maven does have an automated way of creating this. Instead, you need to first create the webapp project and then create the java tree. Create the webapp project

mvn archetype:generate \
  -DarchetypeGroupId=org.apache.maven.archetypes \
  -DarchetypeArtifactId=maven-archetype-webapp \
  -DarchetypeVersion=1.4 \
  -DinteractiveMode=false \
  -DgroupId=com.andrewgilmartin \
  -DartifactId=system-application \
  -Dversion=1.0-SNAPSHOT

Now create the Java tree

cd system-application
mkdir -p \
  src/main/java \
  src/main/resources \
  src/test/java \
  src/test/resources

The result is

.
├── pom.xml
└── src
    ├── main
    │   ├── java
    │   ├── resources
    │   └── webapp
    │       ├── WEB-INF
    │       │   └── web.xml
    │       └── index.jsp
    └── test
        ├── java
        └── resources

The pom.xml file is little different from those created before. The significant change is the <packaging/> element

<packaging>war</packaging>

The "war" value directs Maven to create the war instead of a jar (the default). Now add to the pom.xml the common and system-core dependencies, and any other dependencies specific to the application.

Your web application runs within a servlet container and that container provides some of your dependencies. You need these dependencies for compilation, but they should not be bundled into your war. Maven calls these "provided" dependencies. For these dependencies add a <scope/> element to your <dependency/> element, eg

<dependency>
    <groupId>org.apache.tomcat</groupId>
    <artifactId>tomcat-servlet-api</artifactId>
    <version>8.0.15</version>
    <scope>provided</scope>
</dependency>        

Copy the application's code, configuration, and webapp from the system source tree to this project. Build and test as normal until you have clean results.

If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.

Creating Maven projects for the core packages and command line tools

This posting continues the series on moving from an Ant to a Maven build. If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.

With your common packages now having their own Maven build you can move on to the system itself. For this series I am assuming that your system is composed of a web application with several command line tools. The web application is likely a large set of servlets or Spring controllers. It's a monolith and it is going to stay that way for the near future. The command lines tools are used for nightly batch operations or ad hoc reports, etc. What they have in common is that they require some of the system's packages to function. Eg, they depend on its data access packages, protocol facilitation packages, billing logic packages, etc. The next stage is to separate the system's core code, the command line tools, and the application code and its configuration.

System Core Project

Create a new Maven project for the system core code

mvn archetype:generate \
  -DgroupId=com.andrewgilmartin \
  -DartifactId=system-core \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DarchetypeVersion=1.4 \
  -DinteractiveMode=false

Replace the groupId and artifactId as appropriate.

Copy all the system core code to this project much like you did when extracting the common code. You will likely again find that the core code has entanglements with non-core code that you are going to have to work out. That can be very difficult and require some refactoring; hopefully not significant enough to abandon the whole effort.

As you are assembling the system-core project you may discover that it tries to come to life. You have the Java equivalent of archaea and bacteria, ie a self configuring class or sets of classes. These are classes with static blocks, eg

public class Archaea {
    static { /* do some configuration */ }
}

That static block is executed as the class is used. Normally this has not been an issue as the classes were always used in the context of the whole system. Now they are isolated. If they depended on external resources or files that are no longer available then their initialization failures leave them in undefined states. You will need to work this out. Can the static block be eliminated or replaced with initialization upon first instance use? Maybe a Design Patterns refactoring is needed.

Build and test as normal until you have clean results.

Once your system-core project is complete remove its code from the system's source tree, remove unneeded dependencies from the Ant build.xml, and add the new dependency to the <mvn-dependencies/> element in build.xml. Build and test the system as normal until you have clean results.

Command Line Tool Projects

Now extract the command line tools from the system into their own Maven projects. These projects will depend on the system-common and system-core projects. The Maven build will also need to create an "uberjar", that is a single jar that bundles all the classes and jars needed to run the tool.

Pick a command line tool and create a new Maven project for it as you would normally. Eg, for the gizmo command line tool use

mvn archetype:generate \
  -DgroupId=com.andrewgilmartin \
  -DartifactId=gizmo \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DarchetypeVersion=1.4 \
  -DinteractiveMode=false

Replace the groupId and artifactId as appropriate. Add to the pom.xml the system-common and system-core dependencies, and any other dependencies specific to the tool. Copy the tool's code from the system source tree to this project. Build and test as normal until you have clean results.

To create the "uberjar" update pom.xml and replace the whole <plugins/> with

<plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.1.0</version>
        <configuration>
            <descriptorRefs>
                <descriptorRef>jar-with-dependencies</descriptorRef>
            </descriptorRefs>
            <archive>
                <manifest>
                    <addClasspath>true</addClasspath>
                    <mainClass>com.andrewgilmartin.gizmo.App</mainClass>
                </manifest>
            </archive>
        </configuration>
        <executions>
            <execution>
                <id>assemble-all</id>
                <phase>package</phase>
                <goals>
                    <goal>single</goal>
                </goals>
            </execution>
        </executions>
    </plugin>
</plugins>

Replace "com.andrewgilmartin.gizmo.App" with the fully qualified class name of the tool. When you now build the Maven project you will see "maven-assembly-plugin" log

--- maven-assembly-plugin:3.1.0:single (assemble-all) @ gizmo ---
Building jar: /home/ajg/src/gizmo/target/gizmo-1.0-SNAPSHOT-jar-with-dependencies.jar

The file "gizmo-1.0-SNAPSHOT-jar-with-dependencies.jar" is the uberjar. To trial run your command line tool use

java -jar target/gizmo-1.0-SNAPSHOT-jar-with-dependencies.jar

Don't forget to add whatever command line options prevent the tool from doing any actual work!

Once your tool is complete remove its code from the system's source tree and remove unneeded dependencies from the Ant build.xml.

Continue this procedure for each of your command line tools.

Where are we

At this point you have

  1. System common code Maven project
  2. System core code Maven project
  3. Command line tools Maven projects
  4. Remaining system Ant project

The remaining system is just the web application with its configuration, servlets or controllers, and the odd ball classes that don't fit in system-common or system-core. The next stage is to refactor the system Ant project itself.

Creating a Maven project for the common packages

This posting continues the series on moving from an Ant to a Maven build. If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.

When you have replaced all your external libraries with Maven dependencies you can move on to the next stage in moving from Ant to Maven. This stage is to separate out one or more of your Java packages into their own Maven projects. There are a number of good reasons to separate out packages, but the practical one is that Maven is best used to build one deliverable.

Every organization has a "common" package that contains classes adding general purpose functionality. There are likely several "utility" packages that augment the standard Java libraries for IO, networking, text, collections, functions, streams, etc. There are packages for data structures and algorithms that were sufficiently independent of the primary application to be have been separated out from its packages. These common packages are also likely used by supplementary tools to your primary application. If you already build separate jars for your common packages this step is not that much work. If you don't then you are likely to find some unexpected and unwanted entanglements with application code that needs to be worked out.

A note about the version control. Separating out packages does not mean you also have to move from a monorepo if that is what you are currently using. Moving away from a monorepo does simplify assuring that your common packages are isolated from your application code, however. If you have been building separate jars for your common packages then you have already established build mechanisms to maintain the isolation. If you haven't then this is something you will need to do in your monorepo. This tutorial, however, assumes you are not using a monorepo.

The common packages will be your first Maven project. Create an empty Maven project using archetype:generate

mvn archetype:generate \
  -DgroupId=com.andrewgilmartin \
  -DartifactId=system-common \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DarchetypeVersion=1.4 \
  -DinteractiveMode=false

Change the "groupId" and "artifactId" appropriately. The command will create the tree

.
└── system-common
    ├── pom.xml
    └── src
        ├── main
        │   └── java
        │       └── com
        │           └── andrewgilmartin
        │               └── App.java
        └── test
            └── java
                └── com
                    └── andrewgilmartin
                        └── AppTest.java

Move your common packages into the src/main/java tree. Move any tests you have for these packages into the src/test/java tree. (Delete AppTest.java from both trees at some point.) Your IDE will likely be very helpful in moving files and even version histories.

A build now will very likely fail due to Java language level and missing dependencies. Edit "pom.xml" and update the "maven.compiler.source" and "maven.compiler.target" properties appropriately, eg for Java 8 use

<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.9</maven.compiler.target>

Try a compile just to see what happens!

mvn package

Lots of missing dependencies! Using the information you gathered before add each dependency under the pom.xml file's <dependencies/> element. For example, if the common package is dependent on Apache's HTTP Client 4 then add

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.10</version>
</dependency>

The org.apache.httpcomponents are located at Central and this is most likely already listed in your Maven installation's $HOME/.m2/settings.xml file. If it is not or you would rather not depend on the installation's settings (a good practice) then add the <repository/> element under the <repositories/> element. If the <repositories/> is missing from pom.xml then add it above the <project/> end tag.

<project>
  ...
  <repositories>
    <repository>
      <id>central</id>
      <name>Central Repository</name>
      <url>https://repo1.maven.org/maven2/</url>
    </repository>
  </repositories>
</project>

Continue adding dependencies and building and testing until you get a clean result.

You now have your common packages in their own jar

target/system-common-1.0-SNAPSHOT.jar

After installing this in you local Maven cache, $HOME/.m2/repository/, you can use it in your Ant build.xml. To install it use

mvn install

And you should see the logged output similar to

[INFO] Installing /home/ajg/src/system-common/target/system-common-1.0-SNAPSHOT.jar to /home/ajg/.m2/repository/com/andrewgilmartin/system-common/1.0-SNAPSHOT/system-common-1.0-SNAPSHOT.jar
[INFO] Installing /home/ajg/src/system-common/pom.xml to /home/ajg/.m2/repository/com/andrewgilmartin/system-common/1.0-SNAPSHOT/system-common-1.0-SNAPSHOT.pom

To use the installed jar in your Ant build you need only to add the new dependency, eg

<mvn-dependencies pathId="runtime-dependencies.classpath" ... >
  ...
  <dependency groupId="com.andrewgilmartin" artifactid="system-common" version="1.0-SNAPSHOT"/>
  ...
</mvn-dependencies>

Remove from the system's source tree all the common packages now compiled into system-common-1.0-SNAPSHOT.jar. Also remove all the <dependency/> elements that were only used by the common packages as Maven already knows the jar's dependencies (from its pom.xml file that was also installed into $HOME/.m2/repository).

If you have other common packages, eg enhancements to Swing or extensions to J2EE, then repeat the above procedure for each of them.

A series of posting about moving an ancient Ant build to Maven

This is a table of contents page for a series of posting about moving an ancient Ant build to Maven. If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.

  1. Moving to Maven from an ancient Ant build
  2. Running a Maven repository manager
  3. Running a Maven repository manager using Apache WebDav
  4. Creating a Maven project for the common packages
  5. Creating Maven projects for the core packages and command line tools
  6. Creating a Maven project for your web application

The code for examples is at

https://github.com/andrewgilmartin/maven_examples

Running a Maven repository manager using Apache WebDav

This posting continues the series on moving from an Ant to a Maven build. If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.

Implementing a Maven repository manager using Apache's HTTPd with WebDav is straightforward. For this example, create the directory tree

.
├── apache2
│   ├── httpd.conf
│   └── logs
└── www
    └── repository

For the purposes of this example the directory tree is in /var/maven/

The httpd.conf file contains

LoadModule mpm_prefork_module libexec/apache2/mod_mpm_prefork.so
LoadModule unixd_module libexec/apache2/mod_unixd.so 
LoadModule authz_core_module libexec/apache2/mod_authz_core.so  
LoadModule access_compat_module libexec/apache2/mod_access_compat.so   
LoadModule mime_module libexec/apache2/mod_mime.so 

LoadModule dav_module libexec/apache2/mod_dav.so
LoadModule dav_fs_module libexec/apache2/mod_dav_fs.so

ServerRoot ${BASEDIR}/apache2
LogLevel warn
PidFile logs/httpd.pid

Listen 8080
ServerName localhost:8080
ServerAdmin andrew@andrewgilmartin.com

AddType application/octet-stream .sha1
AddType application/octet-stream .md5
AddType text/xml .xml
AddType text/xml .pom
AddType application/octet-stream .jar
AddType application/octet-stream .war

DocumentRoot ${BASEDIR}/www

DavLockDB DavLock
<Directory />
    DAV On
</Directory>

Start the HTTP server (in the foreground) using

BASEDIR=/var/maven/ httpd -f /var/maven/apache2/httpd.conf -X

For Maven deploy to this repository manager you need to make two changes to your pom.xml. The first is to include the extension in the <build/> section

<extensions>
  <extension>
    <groupId>org.apache.maven.wagon</groupId>
    <artifactId>wagon-webdav-jackrabbit</artifactId>
    <version>3.3.4</version>
  </extension>
</extensions>

The second is to use the "dav:" prefix to your <distributionManagement/> element URLs

<distributionManagement>
  <snapshotRepository>
    <id>neighborhood-snapshots</id>
    <name>Neighborhood Snapshots</name>
    <url>dav:http://localhost:8080/repository/</url>
  </snapshotRepository>
  <repository>
    <id>neighborhood-releases</id>
    <name>Neighborhood Releases</name>
    <url>dav:http://localhost:8080/repository/</url>
  </repository>
</distributionManagement>

To deploy your project use

mvn clean package deploy

This posting is part of the series about moving to Maven from an ancient Ant build.

Running a Maven repository manager

This posting continues the series on moving from an Ant to a Maven build. If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.

The availability of most Java libraries in Maven's repositories is the most valuable of its features. It is difficult to say if Maven's build and deploy tools were the catalyst or that its appearance simply coincided with a time when it was obvious to all that centralized repositories with a common access protocol was needed. The why does not matter much now as we have Maven and it is well established in the Java ecosystem.

A Maven repository (aka a site) is a well defined file system accessible via one of several protocols. If you are using HTTP then the repository server needs to implement the GET and PUT methods for files. For example, if your dependency is (pom.xml)

<dependency>
  <groupId>com.andrewgilmartin</groupId>
  <artifactId>example1</artifactId>
  <version>1.0</version>
</dependency>

and the repository is

<repository>
  <id>neighborhood</id>
  <url>http://localhost:8080/repository/</url>
</repository>

then when you use Maven to build, eg

mvn -U clean package

Maven will initially GET the dependency's files

/repository/com/andrewgilmartin/example1/1.0/maven-metadata.xml
/repository/com/andrewgilmartin/example1/1.0/maven-metadata.xml.sha1

and then based on the detail in maven-metadata.xml it get will GET the files

/repository/com/andrewgilmartin/example1/1.0/example1-1.0.jar
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.jar.sha1
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.pom
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.pom.sha1

When you deploy, eg

mvn deploy

using the distribution management (pom.xml)

<distributionManagement>
  <snapshotRepository>
    <id>neighborhood-snapshots</id>
    <name>Neighborhood Snapshots</name>
    <url>http://localhost:8080/repository/</url>
  </snapshotRepository>
  <repository>
    <id>neighborhood-release</id>
    <name>Neighborhood Releases</name>
    <url>http://localhost:8080/repository/</url>
  </repository>
</distributionManagement>

Maven will first GET the files

/repository/com/andrewgilmartin/example1/1.0/maven-metadata.xml
/repository/com/andrewgilmartin/example1/1.0/maven-metadata.xml.sha1
/repository/com/andrewgilmartin/example1/maven-metadata.xml
/repository/com/andrewgilmartin/example1/maven-metadata.xml.sha1

and then PUT the files

/repository/com/andrewgilmartin/example1/1.0/maven-metadata.xml
/repository/com/andrewgilmartin/example1/1.0/maven-metadata.xml.md5
/repository/com/andrewgilmartin/example1/1.0/maven-metadata.xml.sha1
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.jar
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.jar.md5
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.jar.sha1
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.pom
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.pom.md5
/repository/com/andrewgilmartin/example1/1.0/example1-1.0.pom.sha1
/repository/com/andrewgilmartin/example1/maven-metadata.xml
/repository/com/andrewgilmartin/example1/maven-metadata.xml.md5
/repository/com/andrewgilmartin/example1/maven-metadata.xml.sha1

The Maven client does all the work to ensure that the repository has all the necessary metadata about the deployed package. That is, the HTTP server needs no Maven intelligence. A dedicated repository manager, like Apache Archiva, Sonatype Nexus, and JFrog Artifactory, does far more than just manage files, but for small collections these features are not needed.

For the next stage in migrating your Ant project to Maven you need only configure an HTTP server for GET and PUT. If the HTTP server uses the file system directly, then it will need to be able to create the intermediate directories for any PUT files. During the Ant to Maven transition, however, you can use my simple HTTP server at https://github.com/andrewgilmartin/filehttpserver/. For production you will likely want to use, for example, Apache2 and WebDav.

Moving to Maven from an ancient Ant build

This posting initiates the series on moving from an Ant to a Maven build. If you are interested in my help with your Ant to Maven transition contact me at andrew@andrewgilmartin.com.

You use Ant to build your system of one or more deliverables. You have one build.xml file. You have toyed with separating out the deliverables into individual build.xml files, but there seemed not to be enough value in doing so. Over time your build.xml has actually had less unique things to do, eg RMI no longer needs a compile step, you moved database initialization and configuration into another tool, and the newer React web front end is built elsewhere. Effectively the build.xml just complies the source, runs the tests, and assembles one or more jars or wars.

The Ant build while now simpler has a number of downsides. You still manually manage the large set of external libraries. Perhaps those libraries have not been updated in years as it is too cumbersome to track down their origin and incorporate the most recent, viable version. Your IDE does not integrate well with your bespoke build environment. The IDE's tools for creating test classes, running tests, showing library JavaDoc, navigating into library sources, and code completion is ineffective or unusable.

It would be better all-round if you moved to Maven. This series of postings is an approach to accomplishing that.

If you have not yet installed Maven then do so from

https://maven.apache.org/install.html

The first stage to moving to a Maven build is to replace the external libraries with Maven dependencies. You don't need to replace Ant to do this thanks to the Maven Ant Tasks

https://maven.apache.org/ant-tasks/

Download the most recent jar and place it in your build libraries. Now add the following Ant task definition to your build.xml (near the top of the file)

<taskdef
  name="mvn-dependencies"
  classname="org.apache.maven.artifact.ant.DependenciesTask"
  classpath="${basedir}/buildlib/maven-ant-tasks-2.1.3.jar"
  onerror="report"/>

Update the "classpath" attribute as necessary. Do a clean build to ensure that the "mvn-dependencies" task is accessible.

The Ant build likely has different and/or overlapping external libraries for building, testing, and running. You will be replacing these with equivalent dependencies declared in <mvn-dependencies/> elements. Add the following elements to your build.xml (after the <taskdef/>)

<mvn-dependencies
  pathId="build-dependencies.classpath"
  sourcesFilesetId="build-dependencies-sources.classpath"
  javadocFilesetId="build-dependencies-javadoc.classpath"
  settingsfile="${basedir}/maven-settings.xml" >
  <!--
  <dependency groupId="" artifactid="" version=""/>
  -->
</mvn-dependencies>

<mvn-dependencies
  pathId="test-dependencies.classpath"
  sourcesFilesetId="test-dependencies-sources.classpath"
  javadocFilesetId="test-dependencies-javadoc.classpath"
  settingsfile="${basedir}/maven-settings.xml" >
  <!--
  <dependency groupId="" artifactid="" version=""/>
  -->
</mvn-dependencies>

<mvn-dependencies
  pathId="runtime-dependencies.classpath"
  sourcesFilesetId="runtime-dependencies-sources.classpath"
  javadocFilesetId="runtime-dependencies-javadoc.classpath"
  settingsfile="${basedir}/maven-settings.xml" >
  <!--
  <dependency groupId="" artifactid="" version=""/>
  -->
</mvn-dependencies>

The "pathId" is going to be used in conjunction with your existing <path/> elements for constructing the classpaths used in building, testing, and running. Eg,

<path id="build.classpath">
  <path refid="runtime-dependencies.classpath"/>
  <path refid="build-dependencies.classpath"/>
  <!-- ... -->
</path>

<path id="test.classpath">
  <path refid="runtime-dependencies.classpath"/>
  <path refid="build-dependencies.classpath"/>
  <path refid="test-dependencies.classpath"/>
  <!-- ... -->
</path>

<path id="runtime.classpath">
  <path refid="runtime-dependencies.classpath"/>
  <!-- ... -->
</path>

The <mvn-dependencies/> "settingsfile" attribute references a Maven configuration. This file is used to specify the Maven repository managers, ie where dependences can be found. You should place this file under version control alongside the build.xml. The base file is

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <profiles>
    <profile>
      <id>default</id>
      <repositories>
        <!--
        <repository>
          <id></id>
          <url></url>
        </repository>
        -->
      </repositories>
    </profile>
  </profiles>
  <activeProfiles>
    <activeProfile>default</activeProfile>
  </activeProfiles>
</settings>

You are now ready to use Maven dependencies. For each external library you will need to know its name, version, and, if possible, origin URL. To find these search the Maven repository index at

https://mvnrepository.com/

Eg, if you are using Apache Codec at runtime then search for "apache codec" and navigate to the dependency version page

https://mvnrepository.com/artifact/commons-codec/commons-codec/1.8

You will need to record the group id, artifact id, version, and repository URL. Get the group id, artifact id, and version from the Maven tab, ie

<dependency>
  <groupId>commons-codec</groupId>
  <artifactId>commons-codec</artifactId>
  <version>1.8</version>
</dependency>

Add this to the runtime <mvn-dependencies/> element in build.xml, eg

<dependency groupId="commons-codec" artifactid="commons-codec" version="1.8"/>

Get the repository URL from the Repositories field, ie following the "Central" link reveals that the URL is

https://repo1.maven.org/maven2/

If the repository URL is unique then update maven-settings.xml, eg under <repositories/> add

<repository>
  <id>Central</id>
  <url>https://repo1.maven.org/maven2/</url>
</repository>

Now delete the external library's jar. Do a clean build to test that you have correctly replaced the dependency.

Continue replacing external libraries with dependencies until you are done. It has been my experience that this will progress far quicker than you initially worried it would.

You may find a few external libraries that were themselves never built with Maven or were never deposited to a repository. For these you will need to continue to use the manually managed jar. Your best option moving forward is to replace the library with a functional equivalent. That task is for another day, however.

If you have now emptied your directories of external libraries then remove their reference from your <path/> elements (or <javac/>, etc).

A word of warning. It is very tempting to replace the external library with a newer dependency. Don't do it. Your task right now is only to replace external libraries with equivalent dependencies. Simply make note that a newer version exists.

Completing tasks stepwise

After waiting on an improbable response from the HealthSource RI health benefits web app I lamented on Twitter
Every time I have to wait for the interactive response of some application I wish the Envoy Framework was common. December 20, 2019
So what was so great about the Envoy Framework, a framework that was designed 28 years ago for a pre-Internet environment? You should read the whole article, but in short there were two user interaction aspects that I wanted

1. A user is able to disconnect from their session and return at a another time and continue where they left off.

2. A user is notified when requested work has been completed or can continue using their immediately appropriate notification mechanism.

In the HealthSource RI situation I was signing up for 2020 benefits. There is a short window of time to make these benefit decisions and there are many people making them. These circumstances put a strain on an implementation not designed to scale. Even an implemented designed for the cloud will be handicapped when there are choke points. That is, one or more resources that every request needs to interactive with. There are a number of technical solutions to these, but there are also user interaction solutions also. The easiest solution is to ask the user to come back later to complete the work.

On the surface interrupting the user seems the wrong action to take. The person is here and now and so the system should use this opportunity to complete the task. The rub comes when the system slows down and the task is taking longer than the person has time to give here and now. It is better to let the user leave and notify them when the system is ready to continue. When the user returns the system reestablishes the user's context, ie it brings them back up to speed, so as to not unduly prolong the time to completion.

If I were designing HealthSource RI I would make interactions interruptible. Long tasks are broken down into short tasks, each of which can be completed with limited (near zero) interaction with the system. Each time the user returns to work further on the long task they are presented with a plain summary of what has been accomplished and what is left to accomplish. This summary can be written or visual. When a long task is interrupted the user will be contacted, repeatedly if necessary, to return and complete the task. For time critical tasks the contact mechanisms may need to escalate; a person might need to actually phone the user and compete the task over the phone.

People are used to completing tasks stepwise in the physical world. Why not make enabling that a norm in the virtual world too?

Good advice on how to resolve the four failings of operations


The first 17 mins are the setup for subsequent good advice on how to resolve the the four failings of ops, ie trust, toil, silos, and queues. In fact, I think these four failings are true for most project management.