Installing the stupid filter

I enjoy Seth Godin's blog. The entry today is Installing the stupid filter which is about how humans don't always accept questions or directions as stated. That is, humans ask "Are you sure?" Machines don't.

The problem is that machines are given bad data all the time and most accept it verbatim. Crossref, my employer until Jan 11, handles lots of XML encoded data. So we need to manage both complicated structures and many types of data -- publication dates, personal names, country names, company names, page numbers, volume numbers, ORCID iDs, ISBNs, ISSNs, Pub Med ids, etc. Some types have a strict syntax and so we can know if the value is valid. What we can't know is whether the value is appropriate. We have to guess.

Is a publication date 2 months from now appropriate? In most cases, the answer is "yes" as the publisher is depositing the metadata for a forthcoming publication. But what about 3 years from now? If the publication is part of a book set that is expected to take 10 years to complete publication, then "yes," too. If it is a journal article then almost certainly the answer is "no." At what point is an article title too long? Is it, as we have experienced, a misplaced abstract? It seems the more data we have the more the questions we have about it.

I don't have any answers for these questions. I just want to make the comment that even in machine to machine data exchanges there needs to made sanity checks on the data and those checks have to be within the larger context of each datum.

Red Chair Studio Clay

If you still looking for a unique gift see what Red Chair Studio Clay has on Instagram and at redchairstudio.com.

BitBar shout out

I use the MacOS menu bar tool BitBar for a few uses. I have the time in UTC and the status of a few servers. Behind these presentations are very simple bash or perl scripts. I was reading today of a small utility to show how many days remained until a set of events. The BitBar equivalent is I will leave removing past events from the output to the reader.

Craig Forbes' Tuckbox Generator

I wanted a simple and inexpensive way to store my X Wing miniatures. In the process of looking at how others had solved this problem I found Craig Forbes' Tuckbox Generator. Use this to create custom boxes for each miniature. See microhangers. Having many little boxes is too fiddly for my general use, but I am very glad someone wrote a bit of software to create box templates.

Tevo Tarantula 3D printer is coming together

The Tevo Tarantula 3D printer is coming together. So far I have only mixed up two support beams and stripped the treads on one stepper motor. Luckily, switching the beams around did not involve replacing a belt. And I was able to use longer M3 bolts to reach the stepper motor's bottom threads that were still intact. (Thank you Jerry's Hardware!) Now I need ferrules and fork wire terminals and, as far as I know, there is no store in RI that sells these!

Update, 2018-11-27: Jerry's and West Marine both sell fork terminals.

Make visible residence hall problems and resolutions

Letter to the University of Massachusetts Boston Chancellor:

The Boston Globe article "Falling elevators, raw hamburger, lax security at UMass Boston dorms" is a serious check on the trust, confidence, and enthusiasm I have for my son's continuing attendance at UMB. I do not expect new facilities to be without flaws. Nevertheless, a step to regaining the community's trust in the joint venture between UMB, Capstone, and Sodexo is to make public all the service and repair work orders, actions toward, resolutions of, and all other timeline details. Collecting and presenting this information every 24 hours would allow us all to see the extent of the problems and the pace of resolution. Let us see that Capstone and Sodexo put the health and safety of the students over profits and that UMB is a good steward.

Your truly,
Andrew Gilmartin

The debugging details you leave behind

One of the worst information losses of leaving a place you have worked for nine years is all debugging details you leave behind in the bug tracking system.

3D printing comes to Saugatucket Rd

I succumbed. Exchanged cost (under $200) for some frustrating days ahead.

Red Chair Studio ceramics sale Saturday & Sunday

Red Chair Studio is having a studio sale today and tomorrow, Sat, Oct 13 and Sun, Oct 14, between the hours of 9 AM to 4 PM. The location is 574 Saugatucket Rd, South Kingstown, RI 02879 (map).

 

Who buys hundreds of dollars of used fencing sight unseen?

Experienced my first active phishing scam today. I posted to Craigslist the sale of some chainlink fence paneling I no longer need (and, frankly, want out of my yard). Within the hour I received a text message from Steven Anthony at 903-865-2390 saying he would buy it, pay with PayPal, and would arrange shipping. It seemed a little odd that someone in Texas would buy $600 of paneling sight-unseen. Nevertheless, I gave him my PayPal account name and my home address. Then the buyer wrote
"I will have to included the shipper funds with the payment so that you will pay them once you receive the payment." 
Now, I am not selling something that can be shipped USPS Priority mail for $8. The paneling is a few 100 pounds of metal and 100 or more square feet. The buyer is going to have to contract with a long distance hauler and, most likely prepay, some or all of the price. I passed on the sale. The buyer never texted back with a counter offer or a simple goodbye. Part of me wanted to see what would happen next. My expectation was that I would soon have received an email from "PayPal" having me login and collect the payment.

I have no way of knowing if Mr Anthony was or was not a legitimate buyer. But there were many signals of fraud in our interaction. If I do discover that Mr Anthony was, indeed, just trying to buy a bunch of cheap fencing quickly, then I will apologize to him and remove this posting.

We are all a seafaring people



Maps might not be the territory, they often define a world view. This Spilhaus Projection drawn by Clara Dealberto is a stunning example of such a map. We are all a seafaring people.

Incident Response Slack application series

I am have finished working on the Incident Response Slack application. It was a hobby project and I feel no need to take it any further, although, I would like to switch to RDS from SimpleDB. Here are the postings related to the project
  1. Incident Response Slack App 
  2. Adding persistence to the Incident Response Slack application 
  3. Who wants to perpetuate a flawed design when a proper one is just around the corner? 
  4. Incident Response and stating requirements
  5. Access AWS Secrets Manager from AWS Elastic Beanstalk

Access AWS Secrets Manager from AWS Elastic Beanstalk

I finally figured out how to connect my Incident Response Java application running in AWS Elastic Beanstalk with its associated secret in AWS Secrets Manager. As always, the solution is obvious once it is known. What is not so obvious to me as I write this, is how many of previous "solutions" were correct but the "system" had not yet become "eventually consistent?" Moreover, the current solution is very specific and that seems wrong. Nevertheless, here are the solution's parts.

Assume you have created an AWS Secrets Manager called S1 and an AWS Elastic Beanstalk called A1. What you now need to do is to
  1. Create an IAM Policy called P1 that enables access to S1.
  2. Attach policy P1 to the role aws-elasticbeanstalk-service-role.
  3. Attach policy P1 to the role aws-elasticbeanstalk-ec2-role.
When creating policy P1 give it all Secrets Manager's List and Read actions. I am sure some can be skipped and so you and I should go back and remove unneeded actions. You will also need to limit the actions to only the S1 resource, aka its ARN.

Once this is done, restart your application A1 and it will now have access the S1 secret and its keys and their values.

Since Incident Response needs to store its data in AWS SimpleDB, I extended the S1 policy to include all access to any AWS SimpleDB domain, but this did not work. I had to restrict access to a specific domain's ARN. What bothers me about this is that I have now specified the specific domain in both P1 and as a secret S1 key "aws.simpledb.domain" value. Perhaps there is a way I can read the domain from the policy; an exploration for another day.

The Father (Play)


Tonight (Oct 5) is the last performance of The Father, performed on the outside stage at the Contemporary Theater Company. The story is told from the point of view of a dementia patient. It had the same revelatory experience for me as did reading The Curious Incident of the Dog in the Night-Time for Aspergers. Highly recommended.


DevOps as a category will fade away

I do enjoy seeing all the work going into building out and solving the very real problems associated with running software in distributed systems. DevOps has embraced a lot of this work. I think this is a short term engagement. At its core, DevOps is about empowering developers to deploy and to watch their own applications in production. In the not too distant future, I see distributed systems design returning to its natural home as a part of operating systems design. And DevOps as a category will fade away and we will again focus on application design in a much richer systems environment.

Update, 2018-12-03: "Sorry, Linux. Kubernetes is now the OS that matters," by Matt Asay of InfoWorld.

Incident Response and stating requirements

In my Incident Response hobby project a user creates tasks in workspaces. The new task is given a unique id for the workspace it is in. Ideally, because these numbers are seen and used by people, the ids would be small integers that are strictly increasing. Most databases have some sequence or auto-increment feature, but AWS SimpleDB does not. You must create it yourself or use another mechanism.

Creating a globally consistent, strictly increasing, small integer number mechanism is very hard to do. A reason why ZooKeeper, atomix, etcd, consul, etc, are so widely used is because they do the hard work of implementing value consensus in distributed systems. For my hobby project these tools are overkill, but I still need the mechanism.

To restate the requirement, it is a mechanism enabling multiple processes (on a distributed set of machines) to add tasks to workspaces. But is that really the requirement? Isn't it a premature optimization to assume the need for multiple processes? Perhaps one process can handle all the task creation requests for a single workspace? [1] If I have 100 workspaces I can choose to enable task creation by distributing any request to any process, but I could also choose to distribute task creation to the workspace specific process. Choosing the second option vastly reduces the problem to enabling a single process to sequence the task ids. All I have to do now is direct the request to the right process and this is easily done at Slack with a different URL (option A) during application installation, or at an application layer gateway (option B).

So I am going with one process per workspace [2] with option A.

[1] An assumption is that one process is sufficient to manage the demand of a single workspace.
[2] Note that this does not preclude more than one workspace per process.

Strong consistency models

"Strong consistency models" is a very clear article on the problems of working with data, its all from the past, and collaborating with sometimes disconnected processes. The comments are good too.
Network partitions are going to happen. Switches, NICs, host hardware, operating systems, disks, virtualization layers, and language runtimes, not to mention program semantics themselves, all conspire to delay, drop, duplicate, or reorder our messages. In an uncertain world, we want our software to maintain some sense of intuitive correctness. 
Well, obviously we want intuitive correctness. Do The Right Thing™! But what exactly is the right thing? How might we describe it? In this essay, we’ll take a tour of some “strong” consistency models, and see how they fit together.

Workflow automation needs to take a more central role in our distributed systems designs

I am always at a loss to understand why workflow automation is not part of the common operations infrastructure of even the smallest software development shop. Few continuous, autonomous processes can operate without them needing to keep humans informed or occasionally ask them for input. These needs are usually solved by it sending an email to a functional address, eg "clean-the-data@firm.com," containing a short description of the issue and a link to an HTTP enabled form for providing the input. The email is forwarded to the right person, he or she completes it, and the autonomous processes moves forward, sometimes, changing its path. The sent email and the form's use are rarely centrally logged and so the autonomous processes can not be fully audited. Repeat this ad hoc solution for a dozen more conditions and use it a few times per month and you have created post hoc chaos.

We do see some workflow automation tools regularly used in devops groups, but they are specialized. Jenkins, for example, is used to build and distribute applications. Rundeck manages "cron" tasks that need to be executed on all the hosts or services. Even tools like Spiniker is, at heart, a workflow automation. I suspect that all these could be implemented as workflows on top of, for example, Camunda.

Historically, workflow automation has been entwined with ERP and BPM. I don't recall anyone ever saying their company's ERP migration was a pleasure to participate in. (Which reminds me of Douglas Adams's statement that "It can hardly be a coincidence that no language on Earth has ever produced the expression 'as pretty as an airport'.") To discard workflow automation due to the historical horror stories is truly a lost opportunity for the future.

Workflow automation needs to take a more central role in our distributed systems designs. Camunda seems like a good place to begin -- search for talks by Bernd Rücker of Camunda.


Who wants to perpetuate a flawed design when a proper one is just around the corner?

The SimpleDB (SDB) persistence design in Adding persistence to the Incident Response Slack application is poor. The plan was to persist a block of data containing all of a Slack channel's tasks. This SDB item would be named (SDB's primary key) with some combination of Slack channel attributes, such as channel id and enterprise id. I expected to add a version attribute to the SDB item so that I could use SDB's conditional-put to prevent overwriting someone else's changes to the same task list. This all sounds acceptable, but upon further consideration it is not.

Its flaw is that the design focuses on sets of tasks while the UI -- the slash command -- focuses primarily on individual tasks. This flaw led to a design that adds unnecessary contention to task updating. That is, if tasks A and B were being updated at the same time by users X and X, respectively, then it is likely that one of the two updates would fail and have to be retried. Add more concurrency and tasks to the mix then the service will feed unresponsive (pushing users away) and burden the server (increasing operating costs).

The other problem comes from SDB's eventual consistency and how conditional-put will affect throughput. If X and Y are changing tasks then how long will each have to wait on the other before persisting their change? Intuitively, a conditional-put on the version attribute would require that the value be consistent everywhere before a next change. This effectively pipelines all changes to a channel's tasks without any of the advantages a purposefully designed pipeline has.

For this hobby project these flaws are irrelevant. Nevertheless, who wants to perpetuate a flawed design when a proper one is just around the corner?

Internal displacement

When I see these charts I want to know how I can use my skills to help people and not machines.


Source http://www.internal-displacement.org/