We built an open‑source Python tool that shrinks 30–50 MB PDFs and images down to under 2 MB — with readable output, a desktop GUI, Docker support, CLI, and a one‑click .exe build. This post explains how it works under the hood.

The Problem Nobody Talks About

Every organization has the same quiet pain point: oversized files.

A 42 MB scanned contract that won’t attach to an email
A folder of 35 MB site photos that choke a web upload form
A SharePoint library ballooning because nobody compresses before uploading

Online compressors cap uploads, destroy quality, or charge monthly fees.
Manual Ghostscript commands? Great — if you memorize flags for a living.

We needed something that just works:
target a size, hit compress, done.

What This Utility Does

Capability	Details
Image compression	JPEG, PNG, TIFF, BMP, WebP → optimized JPEG
PDF compression	Ghostscript (primary), raster pipeline, pikepdf fallback
Precision targeting	Set an exact target size (e.g. 1.8 MB)
Scanned PDF intelligence	Auto‑detects text vs scanned pages
Batch processing	Parallel directory compression
Desktop GUI	Tkinter-based Upload → Compress → Download workflow
CLI	Click-powered command-line interface
Docker	Preconfigured image with system dependencies
Standalone .exe	PyInstaller build for zero-install distribution

Real‑World Compression Results

File Type	Input Size	Output Size	Reduction
Images	30–50 MB	~500 KB	~98%
PDFs	30–50 MB	~1.8 MB	~96%

A 42 MB scanned PDF becomes a crisp 1.7 MB file that remains perfectly readable.

Architecture Overview


src/
├── compressor.py
├── image_compressor.py
├── pdf_compressor.py
├── config.py
└── utils.py

main.py
compression_service.py
ui_app.py
ui_actions.py
ui_styles.py
state.py

The key design decision:
one compression core.
Both the CLI and GUI call compress_core().
No duplicated logic. No divergence.

How the Image Pipeline Works

Load & Analyze — dimensions, mode, alpha channel
Pre‑process — RGB conversion, alpha compositing
Resize — proportional downscale using LANCZOS
Optimize — dual‑axis quality + dimension tuning
Validate — verify size and compression ratio

This two‑axis approach (quality + dimensions) enables aggressive targets
like 500 KB from a 40 MB image without turning it into mush.

How the PDF Pipeline Works

Strategy 1: Ghostscript


gs -sDEVICE=pdfwrite
   -dCompatibilityLevel=1.4
   -dPDFSETTINGS=/ebook
   -dColorImageResolution=120
   -dDownsampleColorImages=true
   -sOutputFile=output.pdf
   input.pdf

Building a Python File Compressor with Ghostscript and Pillow

Building a Production‑Grade File Compression Utility in Python

The Problem Nobody Talks About

What This Utility Does

Real‑World Compression Results

Architecture Overview

How the Image Pipeline Works

How the PDF Pipeline Works

Strategy 1: Ghostscript

Strategy 2: Raster Pipeline

Strategy 3: Auto‑Detection

Strategy 4: pikepdf Fallback

The Desktop GUI

The CLI

Using It as a Python Library

Lessons Learned

Getting Started

Leave a Reply Cancel reply

Building a Production‑Grade File Compression Utility in Python

The Problem Nobody Talks About

What This Utility Does

Real‑World Compression Results

Architecture Overview

How the Image Pipeline Works

How the PDF Pipeline Works

Strategy 1: Ghostscript

Strategy 2: Raster Pipeline

Strategy 3: Auto‑Detection

Strategy 4: pikepdf Fallback

The Desktop GUI

The CLI

Using It as a Python Library

Lessons Learned

Getting Started

Tag

Leave a Reply Cancel reply

Tips for writing a blog

Learn how to write a caption