How to remove 'anti-piracy' footers from complex PDFs?

Thorned_Rose@sh.itjust.works · edit-2 9 hours ago

How to remove 'anti-piracy' footers from complex PDFs?

Auster@thebrainbin.org · 8 hours ago

Iirc, tested it out quite a few years ago, and I had to use a software that would both decompile and recompile the PDF, and while it was decompiled, I had to remove the repeating pattern I didn’t want with something like Notepad++. File got recompiled a bit over 50% bigger iirc, maybe different compression methods, but the pages themselves didn’t seem affected.

Sadly can’t remember the name of the program I used for compiling and recompiling, only that it’d do both and that I looked for how to remove watermarks from PDFs. Also the program was certainly offline.

Auster@thebrainbin.org · 7 hours ago

Found a few candidate tools though can test neither now, mutool (part of the mupdf tools), PDFtk, qpdf, pdf2txt (name sounds familiar though it might be memory playing tricks).

If any of those could be found as a single portable exe around 2020, chances are it is the tool I used for it.

queerlilhayseed@piefed.blahaj.zone · 8 hours ago

You might be able to do a find and replace with https://github.com/pymupdf/PyMuPDF . I’m not an expert on PDFs, so I’m not sure if it can be done in a way that preserves all the important formatting, but if you feel comfortable DMing me the PDF (or one of similar complexity) I could try to write a script that replaces all instances of the target text in a way that preserves the rest of the document.

Tolookah@discuss.tchncs.de · 9 hours ago

MaM irc or forums might be able to help with that, if you’re a member, they deal with PDFs and such all the time.

Thorned_Rose@sh.itjust.works · 4 hours ago

I keep meaning to sign up so now is as good time as any!

Tolookah@discuss.tchncs.de · 3 hours ago

Two replies there that came to my attention, while I’m unable to get back to sleep at 5am. An old one mentioning https://github.com/kanzure/pdfparanoia which seems to be an old tool that removes watermarks, hasn’t been updated in 5 years, but neither has the PDF spec?

The other is this paste of text:

if it helps anyone, here’s what I do to prepare a pattern for uploading to make sure it is ‘clean’:

Check over the PDF files for any reference to my name/email address (usually this is in a footer on each page, and not every pattern company does this)
If my personal details are present, I unlock the files using a site like ILovePDF - There are other sites but this one has no daily limits
Open the unlocked PDF in Adobe Professional or another PDF editor of your choice and delete the footer box. You can just delete the box on the first page it appears, or the first page it is a standalone box, then save the pdf, close and reopen it - usually it will now be gone from all pages.
Repeat for any other PDF files (obviously)
Run PDF and jpeg files through an exif cleaner
Double check and upload.

istdaslol@feddit.org · 6 hours ago

Adobo Acrobat can sensor pdfs, afaik you can choose between black and white so maybe this could be a manual road

Thorned_Rose@sh.itjust.works · 4 hours ago

Last time I used Acrobat via WINE, it was more work than it was worth to get stable and running without issues :/

kylian0087@lemmy.dbzer0.com · 2 hours ago

Then use a VM? Better with cracked software anyway.

Hackerpunk1@lemmy.dbzer0.com · 9 hours ago

You need to remove the edit password. Had success using Passware Kit to remove password. From there convert the pdf to word and remove what you need.

Thorned_Rose@sh.itjust.works · 4 hours ago

I was already able to remove the edit password with qpdf --decrypt. Most of the PDF editors I used, changed the PDF too much (e.g. added margins/padding) which ruined the very specific layout needed for the patterns to work. There has to be no changes to the PDFs apart from removing the ‘footer’ text :/

frongt@lemmy.zip · edit-2 8 hours ago

Can you just drop a white box over it?

Edit: if you’re sharing the PDF I suppose not

Onomatopoeia@lemmy.cafe · 8 hours ago

You could do this with a PDF editor, then print to PDF so it’s a new file.

Or use a PDF to Word converter (or similar), which would enable removing such things. Though that can be tricky

Thorned_Rose@sh.itjust.works · 4 hours ago

I tried doing print to PDF but it flattened the resultant PDF so the layers were lost. Almost all of the software I used to try converting altered the PDF layout in some way and patterns must not change at all, otherwise they get messed up :/

Sirence@feddit.org · edit-2 4 hours ago

Maybe you could just search and replace the text with an equal amount on whitespaces? Edit oh sorry I only now understood the issue is finding a tool that replaces and keeps the layers. I have never worked with layers in a PDF so I have no idea on that, sorry

hurtn@lemmy.dbzer0.com · 8 hours ago

remove the locked pdf with https://www.ilovepdf.com/unlock_pdf or any other program.

Than edit the pdf with full version of adobe acrobate

Grumpy@sh.itjust.works · edit-2 8 hours ago

My best answer at this point would be that you need to make your own program to find and remove by content. Because no other manual pdf editor would reasonably have such a feature since it’s so niche.

Also vibe coding with AI tends to be very good for singular tasks like this.

I wouldn’t recommend converting the pdf to anything else since that would remove the layer info unless it’s to more complicated formats like EPS, illustrator, etc.