AI 助手
concepts/compilation-pipeline.md
对话

Compilation Pipeline

The compilation pipeline is the multi-stage process that transforms raw, unstructured information into clean, organized wiki pages with proper interlinking. It serves as the core mechanism for knowledge compilation, systematically processing sources to produce structured reference material. ^[knowledge-compilation.md]

Pipeline Stages

The compilation pipeline operates through several distinct stages that work together to process and organize information:

Ingestion forms the first stage, where raw sources including URLs, files, and documents are collected into a sources directory for processing. ^[knowledge-compilation.md]

Change Detection uses SHA-256 hashes to identify which sources have been modified since the last compilation run, enabling efficient incremental processing. ^[knowledge-compilation.md]

Concept Extraction involves an LLM analyzing each changed source to identify and extract the key concepts contained within the material. ^[knowledge-compilation.md]

Page Generation creates individual wiki pages for each extracted concept, ensuring proper structure and formatting throughout the output. ^[knowledge-compilation.md]

Interlink Resolution processes concept mentions across all pages, wrapping them in wikilinks to create connections between related topics. ^[knowledge-compilation.md]

Index Generation builds a comprehensive table of contents from all concept pages, providing navigation structure for the compiled knowledge base. ^[knowledge-compilation.md]

Incremental Processing

The pipeline implements Incremental Compilation similar to code compilers, where only changed sources require reprocessing. This approach significantly reduces both processing time and API costs by tracking source hashes in a state file and skipping unchanged sources entirely. ^[knowledge-compilation.md]

Cross-Source Integration

When multiple sources discuss the same concept, the pipeline detects this overlap through semantic dependency tracking. Changes to one source automatically trigger recompilation of shared concepts, incorporating content from all contributing sources to maintain consistency and completeness. ^[knowledge-compilation.md]

Output Compatibility

The pipeline generates output using YAML frontmatter and wikilinks format, ensuring direct compatibility with Obsidian and similar knowledge management tools. Each generated concept page includes comprehensive metadata such as title, summary, source attribution, and timestamps. ^[knowledge-compilation.md]

Sources