RAG & Knowledge Fabric

Ingestion Pipeline

  • Source acquisition (PDF, HTML, Markdown).

  • Chunking with structural metadata.

  • Embedding and hybrid indexing.

  • Storage of raw artifacts for replay.

Retrieval Strategy

  • Hybrid retrieval: BM25 + vector similarity.

  • Re-ranking: bge-reranker or ColBERT.

  • Grounded output: citations required by policy.

Stores

  • Vector DB: Qdrant or Weaviate.

  • Graph DB: Neo4j for concept relationships.

  • Object Store: MinIO for raw documents.