OTHER | Compaboom

Go Back

OCR: PDF to HTML

Created an OCR AI LLM Model to Convert PDF to HTML (six meaningless buzzwords in one sentence - new record!)

Code:

Output (Life is Good #):

How this benefits the Lampoon:

- Takes forever to manually enter text into HTML code.

- This program takes less than forever to do the same task.

- Forever > Less Than Forever

- This leaves more time for Lampoon members to do other things, like assist in traumatizing business compers.

Not Fun Fact:

- What took me the longest on this project was configuring PyTesseract (another buzzword? I'm a VC investors wet dream) correctly. I hadn't upgraded Python and had been calling an outdated "pip" when I kept going to install it. I ended up making it in Colab because I'm not smart enough to figure out how to set paths on my own computer apparently. This isn't a fun fact. This was traumatizing.

Even Less Fun Fact:

- I ate Buldak Ramen while programming this. This ramen was banned in Denmark for being too spicy. I didn't know this at the time. It was a rude awakening.

Note: Does it still count as a rude "Awakening" if you were pulling an all-nighter when the stomach cramps hit?