/media/bill/PROJECTS/System_maintenance/pdf edits/0_pdf edits notes.txt www.BillHowell.ca 04Oct2018 initial see also "/media/bill/SWAPPER/bin/pdf edits/0_pdf edits notes.txt" 04Oct2018 I LDME2 Software Manager downloaded : pdftk Tool for manipulating pdf documents (php-based) If PDF is electronic paper, then PDFtk is an electronic stapler-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. PDFtk is a simple tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to: - Merge PDF documents - Split PDF pages into a new document - Decrypt input as necessary (password required) - Encrypt output as desired - Fill PDF Forms with FDF Data and/or Flatten Forms - Apply a Background Watermark - Report PDF on metrics, including metadata and bookmarks - Update PDF Metadata - Attach Files to PDF Pages or the PDF Document - Unpack PDF Attachments - Burst a PDF document into single pages - Uncompress and re-compress page streams - Repair corrupted PDF (where possible) diffpdf Compare two pdf files textually or visually DiffPDF is used to compare two PDF files. By default the comparison is of the text on each pair of pages, but comparing the appearance of pages is also supported (for example, if a diagram is changed or a paragraph reformatted). It is also possible to compare particular pages or page ranges. For example, if there are two versions of a PDF file, one with pages 1-12 and the other with pages 1-13 because of an extra page having been added as page 4, they can be compared by specifying two page ranges, 1-12 for the first and 1-3, 5-13 for the second. This will make DiffPDF compare pages in the pairs (1, 1), (2, 2), (3, 3), (4, 5), (5, 6), and so on, to (12, 13). FPDI Libfpdi-php - Php library for importing existing pdf documents into fpdf is a collection of PHP classes facilitating developers to read pages from existing PDF documents and use them as templates in FPDF. This allows for dynamic generation of PDF files. pdfedit http://pdfedit.cz/en/index.html pdfgrep https://unix.stackexchange.com/questions/6704/how-can-i-grep-in-pdf-files Install the package pdfgrep, then use the command: find /path -iname '*.pdf' -exec pdfgrep pattern {} + Simpliest way is : pdfgrep 'pattern' *.pdf pdfsed this was an image PDF thing, no use see "/media/bill/PROJECTS/System_maintenance/pdf edits/fdf filling pdf forms, 30Nov2015.txt" fdf files php-based file to define modifications to be made by pdftk ******************************** **************** 04Apr2019 editing pdf files For IJCNN2019 copyrights - I tried to remove old stuff $ bash "/media/bill/SWAPPER/bin/pdftk replace text.sh" >> not so good, but whatelse can we do? Many papers will have to be re-submitted!!! In uncompressed "N-19503 sed changed.txt" BT /F24 8.9664 Tf 37.636 18.077 Td [(978-1-7281-1985-4/19/31.00)-342(\050c\0512019)-343(Europ)-28(ean)-343(Union)]TJ/F15 10.9091 Tf 265.637 -29.965 Td [(1)]TJ ET [(978-1-7281-1985-4/19/31.00)-342(\050c\0512019)-343(Europ)-28(ean)-343(Union)] **************** 04Apr2019 editing pdf files +-----+ Encoding of PDF text string https://stackoverflow.com/questions/29467539/encoding-of-pdf-text-string#29468049 >> Apparently, decoding is a nightmare - but pdftotext does it! (in a VERY primitive way) +-----+ search "search and replace text in a pdf file" +---+ pdftk : Sheesh - not obvious from man pages!!!! : https://stackoverflow.com/questions/9871585/how-to-find-and-replace-text-in-a-existing-pdf-file-with-pdftk-or-other-command#9872494 You can try to modify content of your PDF as follows 1 Uncompress the text streams of PDF pdftk file.pdf output uncompressed.pdf uncompress 2 Use sed to replace your text with another sed -e "s/ORIGINALSTRING/NEWSTRING/g" modified.pdf 3 If this attempt was successful, re-compress the PDF with pdftk pdftk modified.pdf output recompressed.pdf compress Note: This way is not successful every time, mainly due to font subsetting edited Apr 11 '13 at 5:43, thirdender answered Mar 26 '12 at 12:54, Dingo >> try it for fun! see "/media/bill/SWAPPER/bin/pdftk replace text.sh" +---+ pdfedit : http://pdfedit.cz/en/index.html Look in LMDE2 Software Manager >> nyet +---+ pdfgrep https://unix.stackexchange.com/questions/6704/how-can-i-grep-in-pdf-files Install the package pdfgrep, then use the command: find /path -iname '*.pdf' -exec pdfgrep pattern {} + Simpliest way is pdfgrep 'pattern' *.pdf pdfgrep 'pattern' file.pdf edited Mar 7 at 11:09, Community♦ answered Dec 23 '11 at 18:40, enzotib >> this could be really handy! +---+ pdfsed - this was an image PDF thing, no use +---+ Online tools : https://search-replace-pdf-text.pdffiller.com/?msclkid=aa3c0b977be810ac274057312b725949&utm_source=bing&utm_medium=cpc&utm_campaign=Search&utm_term=search%20replace%20text%20in%20pdf%20file&utm_content=Search%20%26%20Replace https://www.techwalla.com/articles/how-to-find-replace-text-in-a-pdf Use Adobe Acrobat software to edit! +-----+ How about creating a tex file from the pdf file, then editing the tex file? >> search shows that this doesn't work : too much information lost in the pdf file! +-----+ Create pdf from tex NAME pdftex, pdfinitex, pdfvirtex - PDF output from TeX SYNOPSIS pdftex [options] [&format] [file|\commands] DESCRIPTION Run the pdfTeX typesetter on file, usually creating file.pdf. If the file argument has no extension, ".tex" will be appended to it. Instead of a filename, a set of pdfTeX commands can be given, the first of which must start with a backslash. With a &format argument pdfTeX uses a different set of precompiled com‐ mands, contained in format.fmt; it is usually better to use the -fmt format option instead. pdfTeX is a version of TeX, with the e-TeX extensions, that can create PDF files as well as DVI files. In DVI mode, pdfTeX can be used as a complete replacement for the TeX engine. The typical use of pdfTeX is with a pregenerated formats for which PDF output has been enabled. The pdftex command uses the equivalent of the plain TeX format, and the pdflatex command uses the equivalent of the LaTeX format. To generate formats, use the -ini switch. The pdfinitex and pdfvirtex commands are pdfTeX's analogues to the initex and virtex commands. In this installation, if the links exist, they are symbolic links to the pdftex executable. In PDF mode, pdfTeX can natively handle the PDF, JPG, JBIG2, and PNG graphics formats. pdfTeX cannot include PostScript or Encapsulated PostScript (EPS) graphics files; first convert them to PDF using epstopdf(1). pdfTeX's handling of its command-line arguments is similar to that of of the other TeX pro‐ grams in the web2c implementation. ************************ 04Oct2018 LibreOffice Draw https://itsfoss.com/edit-pdf-files-ubuntu-linux/ How to edit PDF Files in Linux LibreOffice will take some time to load the PDF file. The file will be opened in Draw, part of the suite that manages graphics. Once loaded, you can immediately see that the file is in editable mode. You can also see that it recognizes the table of contents very well. Of course it depends on the original PDF file if it had a table of content or not. ... Once you are done with the edits, instead of saving the file (using Ctrl+S) option, click on Export to PDF button. It will export the file as PDF again. Note that even after exporting the changed file as PDF, it will still ask you to save the file when you try to close LibreOffice Draw. No need to tell you that you don’t need to save it anymore. The reason is that if you try to save it, instead of exporting it to PDF, it will only give you option to save it as open graphics formats. which I presume is not what you want it to be. I also noticed that the edited PDF was smaller in size in comparison to the original one. It saved 1.6 MB file in 1.4 MB file. Needless to mention that you can edit the just edited PDF file as many times as you want. Limitations of editing PDF files with LibreOffice I tried to edit files of few other formats such as ePub. But unfortunately, it did not work the same. Also, this PDF editing won’t work on scanned documents. So the files which were originally created as text and saved as PDF can be edited very easily but it’s not true in case when you have scanned document because those pages are actually images and would need tools that could apply optical character recognition. But you won’t get that with LibreOffice. >> Awesome!! I didn't look closely at : https://itsfoss.com/pdf-editors-linux/ http://www.linuxandubuntu.com/home/5-best-linux-pdf-editors 5. PDF Chain This list would not be complete without mentioning the pdftk tool and the best Linux graphical frontend for it, PDF Chain. pdf chain pdf editor This is a simple but powerful application, but it is not a full-blown graphical editor as the other applications listed here - its usefulness lies elsewhere. It can split a PDF into smaller documents, or merge two into one. It can add backgrounds, stamps, or edit the PDF info, or dump the form data in a PDF, among the many things it can do. ​ All in all, a really great application if you don’t want to edit the text or images in a PDF. I highly recommend it. +-----+ pdftk - looks good, downloaded with LMDE2 Software Manager Pdftk Tool for manipulating pdf documents If PDF is electronic paper, then PDFtk is an electronic stapler-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. PDFtk is a simple tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to: - Merge PDF documents - Split PDF pages into a new document - Decrypt input as necessary (password required) - Encrypt output as desired - Fill PDF Forms with FDF Data and/or Flatten Forms - Apply a Background Watermark - Report PDF on metrics, including metadata and bookmarks - Update PDF Metadata - Attach Files to PDF Pages or the PDF Document - Unpack PDF Attachments - Burst a PDF document into single pages - Uncompress and re-compress page streams - Repair corrupted PDF (where possible) Try : $ pdftk "/media/bill/ramdisk/366_0612.pdf" dump_data +-----+ InfoBegin InfoKey: ModDate InfoValue: D:20170419143344-05'00' InfoBegin InfoKey: PTEX.Fullbanner InfoValue: This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014) kpathsea version 6.2.0 InfoBegin InfoKey: CreationDate InfoValue: D:20170227160819Z InfoBegin InfoKey: Author InfoValue: Fouad Slimane, Andrea Mazzei, Orlin Topalov, Greta Verzi and Frédéric Kaplan InfoBegin InfoKey: Title InfoValue: A Web-Based Tool for Segmentation and Automatic Transcription of Historical Documents InfoBegin InfoKey: Creator InfoValue: 'Certified by IEEE PDFeXpress at 02/27/2017 8:20:21 AM' InfoBegin InfoKey: Application InfoValue: 'Certified by IEEE PDFeXpress at 02/27/2017 8:20:21 AM' InfoBegin InfoKey: Producer InfoValue: ilovepdf.com PdfID0: f646a3460a7d205af1bfdedd156bfc69 PdfID1: 3f751eefcc302544b814ce1284aa8732 NumberOfPages: 8 BookmarkBegin BookmarkTitle: MAIN MENU BookmarkLevel: 1 BookmarkPageNumber: 0 BookmarkBegin BookmarkTitle: Help BookmarkLevel: 1 BookmarkPageNumber: 0 BookmarkBegin BookmarkTitle: Search BookmarkLevel: 1 BookmarkPageNumber: 0 BookmarkBegin BookmarkTitle: Print BookmarkLevel: 1 BookmarkPageNumber: 0 BookmarkBegin BookmarkTitle: Author Index BookmarkLevel: 1 BookmarkPageNumber: 0 BookmarkBegin BookmarkTitle: Table of Contents BookmarkLevel: 1 BookmarkPageNumber: 0 PageMediaBegin PageMediaNumber: 1 PageMediaRotation: 0 PageMediaRect: 0 -7.20001 612 792 PageMediaDimensions: 612 799.2 PageMediaCropRect: 0 -7.20001 612 784.8 PageMediaBegin PageMediaNumber: 2 PageMediaRotation: 0 PageMediaRect: 0 -3.60001 612 792 PageMediaDimensions: 612 795.6 PageMediaCropRect: 0 -3.60001 612 788.4 PageMediaBegin PageMediaNumber: 3 PageMediaRotation: 0 PageMediaRect: 0 -3.60001 612 792 PageMediaDimensions: 612 795.6 PageMediaCropRect: 0 -3.60001 612 788.4 PageMediaBegin PageMediaNumber: 4 PageMediaRotation: 0 PageMediaRect: 0 -3.60001 612 792 PageMediaDimensions: 612 795.6 PageMediaCropRect: 0 -3.60001 612 788.4 PageMediaBegin PageMediaNumber: 5 PageMediaRotation: 0 PageMediaRect: 0 -3.60001 612 792 PageMediaDimensions: 612 795.6 PageMediaCropRect: 0 -3.60001 612 788.4 PageMediaBegin PageMediaNumber: 6 PageMediaRotation: 0 PageMediaRect: 0 -3.60001 612 792 PageMediaDimensions: 612 795.6 PageMediaCropRect: 0 -3.60001 612 788.4 PageMediaBegin PageMediaNumber: 7 PageMediaRotation: 0 PageMediaRect: 0 -3.60001 612 792 PageMediaDimensions: 612 795.6 PageMediaCropRect: 0 -3.60001 612 788.4 PageMediaBegin PageMediaNumber: 8 PageMediaRotation: 0 PageMediaRect: 0 -3.60001 612 792 PageMediaDimensions: 612 795.6 PageMediaCropRect: 0 -3.60001 612 788.4 +-----+ >> doesn't show footers I also LDME2 Software Manager downloaded : diffpdf FPDI is a collection of PHP classes facilitating developers to read pages from existing PDF documents and use them as templates in FPDF. This allows for dynamic generation of PDF files. see "/media/bill/PROJECTS/System_maintenance/pdf edits/fdf filling pdf forms, 30Nov2015.txt" https://www.sitepoint.com/filling-pdf-forms-pdftk-php/ www.BillHowell.ca 04Oct2018 Awesome!!! $ pdftk "/media/bill/ramdisk/366_0612.pdf" dump_data_fields >> no output $ pdftk "/media/bill/ramdisk/366_0612.pdf" generate_fdf # enddoc