Problem: I needed to file my taxes.
I had many email receipts (around 330) in .eml
format. I needed to convert them to .pdf
format before sending them to my accountant as a link to a cloud folder.
Issues:
- My email provider (FastMail) only allows me to download the emails in
.eml
format. - The receipts follow a non-standard format (so getting a suitable regular expressions may be difficult).
My simple repo does the following:
- Converts all email receipts (
.eml
files) to PDFs (The PDFs are largely plain text). - Extracts key financial data from those PDFs into a CSV file (which I can import neatly into Google Sheets)
Example Output
The PDF output looks like this (I manually redacted the actual data):
The CSV output looks like this:
File Name,Total Amount ($),Currency,Transaction Date,Descriptive Details
Receipt-2566-5568.pdf,47.42,USD,2050-06-01,"Render - Servers, PostgresDB, Redis usage for May 2050"
Receipt-2952-5288.pdf,9.52,EUR,2050-03-03,Twitter International ULC - Twitter Blue subscription
And the imported Google Sheets looks like this:
Use Cases
- Generally converting
.eml
files to PDFs - Tax preparation
- Expense tracking
- Accounting reconciliation
- Digital receipt organization
- Audit preparation
Full repo here