Costs of helpful data flexibility.

I'm having a discussion currently with a young developer who has only ever worked in Ruby and JavaScript. I noticed that the developer had chained "symbolize_keys" to the end of a method call. In Ruby this converts a hash's keys from strings to symbols, ie { "a" => 123 } becomes { :a => 123 }. Their reason for this was to offer flexibility to the called as to how it returned the result. They thought this provided flexility and robustness. I countered that it did the exact opposite.

When a function can be given parameters and return results in multiple formats then robustness is only had when the function handles all formats equally. To do that the function needs to be tested with all formats. This can be done, but in practice, and I've seen across many organizations, it is not. Not only the function needs to be tested with the multiple formats, but the callers and the called need to be tested too. It's a combinatorial explosion of testing.

The other detriment to this flexibility is that since no function is sure of the format every function converts the data to its preferred format even if the data is already in the preferred format. This conversion adds to the function's code size and has a runtime cost (CPU and memory) on every invocation. The cost of a single use might be small, but our applications work in a world with thousands of concurrent sessions each with deep call chains, and expect microsecond responses. Those single uses add up.

My recommendation to the developer was to require one format as part of its contract and add validation that runs at least during testing. (I'd like to just tell them to use a typed language where this wouldn't even be an issue!)

I mentioned that the developer's experience is in Ruby and JavaScript. I have found that is common for such developers to not expect data to be in a specific format or type. I assume some of this comes from never being trained to always validate and convert data coming from the outside before using it inside. (Eg, directly passing around an INPUT element's value or a database's column value.)  Once inside, you can be assured of its correctness. Instead, data is passed around without any function knowing a priori that it is correct.

I am unsure if I will convince this developer to not use "symbolize_keys". I am rowing against the tide.