Building a document processing pipeline (Part 1): Preface
I am pleased to have a side project currently running reliably in production which integrates local Large Language Models (LLMs); a document processing web app that my girlfriend uses for her work.
Part of her work (as far as I understand) is receiving many letters, memos, categorizing them, routing them, logging, and monitoring. The letters come in physical papers. Most of those in their office who work on the tasks do it using pen and papers. On the other hand, my clever girlfriend took an initiative to keep digital records in a spreadsheet. But it’s still a bit of effort: typing out and summarizing the documents. And there I found a solid domain-specific use case for LLMs that are all the rage these days.
So I will be writing a series of posts about how I built the document processing pipeline. The objective of this pipeline is to extract specific information from the documents: the recipient, the subject, the body, the sender, and a summary.
I will be continually updating this Preface post to add links to the future posts.