I am working with some code that processes CSV files. Each row corresponds to an existing record and the record is updated in response to the column values. This is not an uncommon task. The existing code implements this in an also not uncommon way by intermixing row parsing and record updating. For example, assume we are updating Foos
class Foo
attr id, :location, :valuation
# ...
end
A typical intermixing is
row = [...]
raise "unusable foo id" if row[0].blank?
foo = Foo.find(row[0].to_i)
raise "foo not found with id #{row[0]}" unless foo
raise "unusable town location" if row[1].blank?
location = Location.find_by(town: row[1])
raise "location not found with town #{row[1]}" unless location;
foo.location = location
raise "unusable valuation" unless values[2].to_i < 10_000
foo.valulation = values[2].to_i
While this initially seems like a reasonable approach it quickly breaks down as the number of columns increase, column format is non-trival, and there are column (or row) interdependencies. But the more significant problem is that the parsing and the updating can't be tested individually. This makes the test harder to write, understand, and maintain.
It is always better to first parse the raw data, validate it, and then use it. Eg
class Record
attr :id, :town, :valuation
attr :foo, :location
def initialize(values)
raise "unusable foo id" unless /^\s*(\d+)\s*$/ =~ values[0]
id = $1.to_i
raise "unusable town location" unless /^\s*(.+)\s*$/ =~ values[1]
location = $1
raise "unusable valuation" unless /^\s*(\d+)\s*$/ =~ values[2]
valuation = $1.to_i
end
def validate
foo = Foo.find(id)
raise "foo not found with id #{id}" unless foo
location = Location.find_by(town:)
raise "location not found with town #{town}" unless location
raise "valuation does not match the minimum" unless valuation >= 10_000;
end
end
# read the raw data
rows = [[...], ...]
# parse and validate the data
records = rows.map do |row|
record = Record.new(row)
record.validate
end
# use the data
records.each do |record|
record.foo.location = record.location
record.foo.valuation = record.valuation
record.foo.save
end
This
parse, validate, and use approach is approporate for all cases where you are bringing data from the outside into your application, no matter what the outside source.
ps. These small, helper classes are your friends. Prefer them over your language's hash primitive as they provide great control. Most languages have efficient syntax for creating and using them.