feat(pipeline): refactor into its own project

This commit is contained in:
Ignacio Ballesteros
2026-02-20 17:54:12 +01:00
parent 0ea5808cd2
commit dc348185a7
17 changed files with 490 additions and 358 deletions

107
AGENTS.md
View File

@@ -244,113 +244,6 @@ git branch -d feature/my-feature
**Merge direction:** `upstream → main → org-roam → feature/*`
## Org-Roam Workflow
Notes live in a **separate directory** outside this repo. The export pipeline
converts them to Markdown via ox-hugo, applies post-processing transforms, then
Quartz builds the site.
### Tooling
The dev shell (`nix develop`) provides:
- `nodejs_22` — Quartz build
- `elixir` — runs the export script and pipeline
- `emacs` + `ox-hugo` — performs the org → markdown conversion
### Export and build
```bash
# Export only (wipes content/, exports all .org files, runs pipeline)
NOTES_DIR=/path/to/notes npm run export
# Export then build the site
NOTES_DIR=/path/to/notes npm run build:notes
# Positional arg also works
elixir scripts/export.exs /path/to/notes
```
Optional env vars for the pipeline:
| Var | Default | Purpose |
| --------------- | ------------------------ | ----------------------------------------- |
| `BIBTEX_FILE` | — | Path to `.bib` file for citation fallback |
| `ZOTERO_URL` | `http://localhost:23119` | Zotero Better BibTeX base URL |
| `CITATION_MODE` | `warn` | `silent` / `warn` / `strict` |
### Export pipeline phases
`scripts/export.exs` runs four phases in sequence:
1. **Wipe** `content/` (preserving `.gitkeep`)
2. **Export** each `.org` file via `emacs --batch` + `ox-hugo``content/**/*.md`
3. **Pipeline** — run Elixir transform modules over every `.md` file
4. **Index** — generate a fallback `content/index.md` if none was exported
The export uses TOML frontmatter (`+++`) and per-file mode (not per-subtree).
### Markdown pipeline (`scripts/pipeline/`)
A standalone Mix project that post-processes `content/*.md` after ox-hugo.
It is compiled automatically on first run; subsequent runs use the `_build/`
cache and are fast.
**Architecture:**
```
scripts/pipeline/
├── mix.exs # deps: req, jason
└── lib/
├── pipeline.ex # Generic runner (fold transforms over .md files)
├── pipeline/
│ ├── application.ex # OTP app — starts Finch HTTP pool
│ ├── transform.ex # Behaviour: init/1, apply/3, teardown/1
│ ├── transforms/
│ │ └── citations.ex # Resolves cite:key → [Label](url)
│ └── resolvers/
│ ├── zotero.ex # JSON-RPC to Zotero Better BibTeX
│ ├── bibtex.ex # Parses local .bib file
│ └── doi.ex # Bare-key fallback (always succeeds)
```
**Adding a new transform:**
1. Create `scripts/pipeline/lib/pipeline/transforms/my_transform.ex`
2. Implement the `Pipeline.Transform` behaviour (`init/1`, `apply/3`)
3. Append the module to `transforms` in `scripts/export.exs`
```elixir
transforms = [
Pipeline.Transforms.Citations,
Pipeline.Transforms.MyTransform, # new
]
```
### Citation resolution (`Pipeline.Transforms.Citations`)
Handles org-citar syntax that passes through ox-hugo unchanged:
| Syntax | Example |
| ---------------- | -------------------- |
| org-cite / citar | `[cite:@key]` |
| multiple keys | `[cite:@key1;@key2]` |
| bare (legacy) | `cite:key` |
Resolution chain (first success wins):
1. **Zotero** — JSON-RPC to `localhost:23119/better-bibtex/json-rpc`
- Calls `item.search` to find the item, then `item.attachments` to get
the PDF link (`zotero://open-pdf/library/items/KEY`)
- Falls back to `zotero://select/library/items/KEY` if no PDF attachment
- Probe uses a JSON-RPC call, **not** `/better-bibtex/cayw`
(that endpoint blocks waiting for interactive input)
2. **BibTeX** — parses `BIBTEX_FILE`; extracts authors, year, DOI/URL
3. **DOI fallback** — always succeeds; renders bare key or `https://doi.org/...`
**Zotero JSON-RPC gotcha:** `Req 0.5` does not allow combining `:finch` and
`:connect_options` in the same call. Use `:receive_timeout` only.
## Important Notes
- **Client-side scripts**: Use `.inline.ts` suffix, bundled via esbuild