|
|
||
|---|---|---|
| inputs | ||
| process | ||
| .gitignore | ||
| README.md | ||
| Sakefile.yaml | ||
| addUsageAmounts.js | ||
| bankersRound.js | ||
| data.ori | ||
| dependencies.svg | ||
| fix_missing_names.sql | ||
| insert-percents.js | ||
| outputFormat.js | ||
| outputFormatBank.js | ||
| percent-divide.js | ||
| percent-round.js | ||
| percentRound.js | ||
| roundMonths.js | ||
| roundMonthsArray.js | ||
| roundUsage.js | ||
| subtree.ori | ||
| withBankPercents.js | ||
| withMonthPercents.js | ||
| withMonths.js | ||
| withPercents.js | ||
README.md
Example Data Pipeline for Web Origami
The purpose of this repository is to explore how Web Origami might work for data transformation pipelines: simple, small-data ETL routines.
Web Origami is primarily/originally aimed at creating hierarchical documents structures like websites from input sources, but, at least at the conceptual level, the process is the same for data processing. However, we need to explore whether the details of this new domain match well with the Web Origami workflow, builtins, and tooling.
dummy dataset: Bob, Alice, James and Sarah share an electricity provider. They all make payments to a shared account and pay their monthly advances to ElecCo from that account. They need to calculate who owes what. There are two billing periods: January to April, and May to August.
The reference pipeline uses sake which can probably be installed with pipx install master-sake. Of the multitude of make for data projects out there, sake is the one which comes closest to matching what I find necessary in this sort of pipelines: a) a tolerable interface to scripts in any language, b) dependency management based on inputs and outputs of steps, c) visualisation of the dependency tree.1
Porting to Web Origami: work in progress
This repository exemplifies a fairly normal level of complexity for this sort of project. I'm not sure yet that I've found the simplest way to do the task at hand, but such is life with random datasets so I think it's fair to use this as an example :)
At the initial commit of this repo, one step has been ported to Origami: the indiv-val-txns step. Most of the steps use csvsql to do processing in in-memory Sqlite databases. This step needs to do financial division which is difficult or impossible in Sqlite. The original version makes use of a Javascript/Deno script to do financial division with more-or-less accurate rounding.
As of now, the Origami version of this step is 99% complete: the final step where the transactions are exported as CSV is dropping records when two arrays are being merged because I can't figure out how to flatten and concatenate two arrays.
There's lots to be done, though, not only in porting the rest of the steps but also improving the step which has already been rewritten.