Books

电子书收藏，部署在 books.lulununu.com。

架构

内容：oreilly/ 目录下按书名存放 markdown + 合并的 book.md
工具：tools/ 目录下 Go CLI，用 chrome-cli 控制 Chrome 抓取 O'Reilly 页面
部署：Cloudflare Pages，从 release 分支部署
CI：GitHub Actions + Mac mini 自托管 runner

下载新书流程

1. 确认书籍格式

⚠️ 只支持旧版 O'Reilly 阅读器（URL 含 kindle_split_xxx.html，页面有 #book-content 元素）。

新版阅读器（URL 含 .xhtml，如第二版 CSS in Depth 9781633438354）不兼容，会报 Element with id "book-content" not found。

判断方法：通过 ezproxy 打开书籍任意章节，检查页面是否有 #book-content。

2. 获取起始 URL

找到书的任意章节页面 URL，通常是第一个 kindle_split 页面，例如：

https://www.oreilly.com/library/view/css-in-depth/9781617293450/OEBPS/Text/kindle_split_001.html

3. 触发批量下载

在 GitHub Actions 手动触发 Batch Convert O'Reilly Book workflow：

gh workflow run convert-book.yml --repo aibeta/books \
  --field url="<起始URL>" \
  --field wait="10" \
  --field retry="3"

workflow 会自动：

打开起始页面，提取 TOC 中可见的章节链接
逐页处理：下载内容 → 提取新发现的链接 → 加入队列（迭代爬取，解决 TOC 折叠问题）
每个章节保存为 oreilly/<书名>/XX-kindle_split_YYY.md
自动合并所有章节为 book.md（按 kindle_split 编号排序，跳过目录/索引等）
Commit & push 到 main

4. 部署

push 到 main 后会自动触发 generate-index.yml：

将 book.md 转为 book.html（带暗黑模式、翻页按钮、阅读位置书签等自定义样式）
如果目录有 book.md，只生成 book.html，不为单章 .md 生成独立 html
生成目录 index.html
Push 到 release 分支 → Cloudflare Pages 自动部署

注意：github-actions bot 的 push 不会触发 generate-index.yml。如果 batch 完成后没有自动部署，需要手动推一个空 commit：

cd <repo> && git commit --allow-empty -m "trigger deploy" && git push origin main

5. 验证

访问 https://books.lulununu.com/oreilly/<书名>/ 确认所有章节和 book.html 可访问。

单章节下载

gh workflow run convert-chapter.yml --repo aibeta/books \
  --field url="<章节URL>" \
  --field wait="10"

本地合并（手动）

cd tools && go build -o dist/web-html-2-local main.go
./dist/web-html-2-local --merge oreilly/<书名>/

最终产物（每本书）

oreilly/<书名>/
├── book.md          # 合并后的完整 markdown（含目录，图片用相对路径）
├── book.html        # 自动生成的 HTML（带自定义样式）
├── images/          # 提取的图片文件
│   ├── 001.png
│   ├── 002.png
│   └── ...
├── 01-kindle_split_001.md   # 源文件（保留但不生成 html）
├── 02-kindle_split_002.md
└── ...

合并逻辑

按 kindle_split 编号排序（非文件名前缀）
跳过：Brief/Detailed TOC、Index、List of Figures/Tables/Listings、空文件
保留：书名页、版权、前言、Part 标题、正文章节、附录
章节间用 --- 分隔

文件限制（Cloudflare Pages）

单文件最大 25 MiB
Free plan 最多 20,000 文件

开发流程（TDD）

所有功能和 bug 修复都遵循 TDD 流程：

功能开发

写测试描述期望行为，确认测试失败（红）
写实现，确认测试通过（绿）
提交

Bug 修复

分析根因
写能复现 bug 的测试，确认失败（红）
修复代码，确认测试通过（绿）
提交

运行测试

cd tools && go test ./merge/ -v

绝对不能跳过测试直接写实现。