Go to file

Hans Fast 8c46b52738 add process files so data.ori can run.		2025-11-11 20:38:02 +01:00
inputs	initial commit	2025-11-10 12:58:12 +01:00
process	add process files so data.ori can run.	2025-11-11 20:38:02 +01:00
.gitignore	add process files so data.ori can run.	2025-11-11 20:38:02 +01:00
README.md	adds README with roject aims and status overview	2025-11-10 13:20:35 +01:00
Sakefile.yaml	add process files so data.ori can run.	2025-11-11 20:38:02 +01:00
addUsageAmounts.js	initial commit	2025-11-10 12:58:12 +01:00
bankersRound.js	initial commit	2025-11-10 12:58:12 +01:00
data.ori	fixes name of bank account in import	2025-11-10 16:03:40 +01:00
dependencies.svg	initial commit	2025-11-10 12:58:12 +01:00
fix_missing_names.sql	initial commit	2025-11-10 12:58:12 +01:00
insert-percents.js	initial commit	2025-11-10 12:58:12 +01:00
outputFormat.js	initial commit	2025-11-10 12:58:12 +01:00
outputFormatBank.js	initial commit	2025-11-10 12:58:12 +01:00
percent-divide.js	initial commit	2025-11-10 12:58:12 +01:00
percent-round.js	initial commit	2025-11-10 12:58:12 +01:00
percentRound.js	initial commit	2025-11-10 12:58:12 +01:00
roundMonths.js	initial commit	2025-11-10 12:58:12 +01:00
roundMonthsArray.js	initial commit	2025-11-10 12:58:12 +01:00
roundUsage.js	initial commit	2025-11-10 12:58:12 +01:00
subtree.ori	initial commit	2025-11-10 12:58:12 +01:00
withBankPercents.js	initial commit	2025-11-10 12:58:12 +01:00
withMonthPercents.js	initial commit	2025-11-10 12:58:12 +01:00
withMonths.js	initial commit	2025-11-10 12:58:12 +01:00
withPercents.js	initial commit	2025-11-10 12:58:12 +01:00

README.md

Example Data Pipeline for Web Origami

The purpose of this repository is to explore how Web Origami might work for data transformation pipelines: simple, small-data ETL routines.

Web Origami is primarily/originally aimed at creating hierarchical documents structures like websites from input sources, but, at least at the conceptual level, the process is the same for data processing. However, we need to explore whether the details of this new domain match well with the Web Origami workflow, builtins, and tooling.

dummy dataset: Bob, Alice, James and Sarah share an electricity provider. They all make payments to a shared account and pay their monthly advances to ElecCo from that account. They need to calculate who owes what. There are two billing periods: January to April, and May to August.

The reference pipeline uses sake which can probably be installed with pipx install master-sake. Of the multitude of make for data projects out there, sake is the one which comes closest to matching what I find necessary in this sort of pipelines: a) a tolerable interface to scripts in any language, b) dependency management based on inputs and outputs of steps, c) visualisation of the dependency tree.¹

Porting to Web Origami: work in progress

This repository exemplifies a fairly normal level of complexity for this sort of project. I'm not sure yet that I've found the simplest way to do the task at hand, but such is life with random datasets so I think it's fair to use this as an example :)

At the initial commit of this repo, one step has been ported to Origami: the indiv-val-txns step. Most of the steps use csvsql to do processing in in-memory Sqlite databases. This step needs to do financial division which is difficult or impossible in Sqlite. The original version makes use of a Javascript/Deno script to do financial division with more-or-less accurate rounding.

As of now, the Origami version of this step is 99% complete: the final step where the transactions are exported as CSV is dropping records when two arrays are being merged because I can't figure out how to flatten and concatenate two arrays.

There's lots to be done, though, not only in porting the rest of the steps but also improving the step which has already been rewritten.

Long ago I've also used drake, which I recall having an even better model for my typical needs — geospatial data processing at the time. The repository is still on Github but hasn't seen activity for a while, and I'm hesitant to try to get such an old Java project going. ↩︎