Concept Extraction

---
title: Concept Extraction
summary: The pipeline stage where an LLM analyzes changed source documents to identify and extract key concepts for wiki page generation
sources:
  - knowledge-compilation.md
createdAt: "2026-04-16T07:41:25.455Z"
updatedAt: "2026-04-16T07:41:25.455Z"
tags:
  - knowledge-management
  - llm
  - pipeline
aliases:
  - concept-extraction
---

# Concept Extraction

Concept extraction is the third stage of the [[Compilation Pipeline]], in which a large language model (LLM) reads each changed source document and identifies the key concepts contained within it. The output of this stage feeds directly into [[Page Generation]], where individual wiki pages are created for each discovered concept. ^[knowledge-compilation.md]

## Role in the Pipeline

Concept extraction sits between [[Change Detection]] and [[Page Generation]] in the [[Compilation Pipeline]]. Only sources flagged as changed by the SHA-256 hash comparison in the [[Change Detection]] stage are passed to the LLM for analysis, keeping the process efficient. Once concepts are extracted, the pipeline proceeds to generate a dedicated wiki page for each one. ^[knowledge-compilation.md]

## How It Works

An LLM reads each changed source and extracts the key concepts present in the material. This analysis is what transforms raw, unstructured information — scattered documents, articles, and notes — into a structured set of named concepts that can each be represented as a standalone reference page. ^[knowledge-compilation.md]

## [[Cross-Source Concepts]]

A concept does not always originate from a single source. When multiple sources discuss the same concept, the pipeline detects this overlap through semantic dependency tracking. In these cases, a change to any one contributing source triggers recompilation of the shared concept, drawing content from all sources that reference it. This ensures the resulting page reflects the full picture rather than a partial view. ^[knowledge-compilation.md]

## Relationship to [[Incremental Compilation]]

Concept extraction is tightly coupled to the [[Compilation Pipeline]]'s incremental processing strategy. Because only changed sources are analyzed, the LLM is not called unnecessarily on documents that have not been modified. This reduces both processing time and API costs, mirroring the efficiency model of a traditional code compiler. ^[knowledge-compilation.md]

## Sources

- knowledge-compilation.md