1

I am running the script at the bottom as a one-liner but have expanded for ease of reading.

Assume the directory contains the following file: File (11).pdf.

When the script executes, it simply repeats the file name three times. i.e.:

File (11).pdf
File (11).pdf
File (11).pdf

I am guessing it has something to do with a poorly crafted sed line but am too much of a novice to find the problem.

If I am to write the following "simplified" script, it executes as expected.

IFS=$'\n'
    for i in `ls *.pdf`
        do
            base=`printf $i | sed -E 's/^(.*)( \([0-9]{1,4}\))(\.pdf)$/\1/g'`
            printf $base"\n"
        done

RESULT

File

IFS=$'\n'
    for i in `ls *.pdf`
        do
            count=`printf $i | sed -E 's/^(.*)( \([0-9]{1,4}\))(\.pdf)$/\2/g'`
            printf $count"\n"
        done

RESULT

 (11)

Problem Script

But when I make the script slightly more "complex" it produces unexpected results.

IFS=$'\n'
    for i in `ls *.pdf`
        do
            base=`printf $i | sed -E 's/^(.*)( \([0-9]{1,4}\)\|)(\.pdf)$/\1/g'`
            count=`printf $i | sed -E 's/^(.*)( \([0-9]{1,4}\)\|)(\.pdf)$/\2/g'`
            pp=`qpdf --show-npages $i`
            printf $i"\n"
            printf $base"\n"
            printf $count"\n\n"
        done

RESULTS

File (11).pdf
File (11).pdf
File (11).pdf

Where am I going wrong? Thanks!


Actual one-liner:

IFS=$'\n'; for i in `ls *.pdf`; do base=`printf $i | sed -E 's/^(.*)( \([0-9]{1,4}\)\|)(\.pdf)$/\1/g'`; count=`printf $i | sed -E 's/^(.*)( \([0-9]{1,4}\)\|)(\.pdf)$/\2/g'`; pp=`qpdf --show-npages $i`; printf $i"\n"; printf $base"\n"; printf $count"\n\n"; done
3
  • (1) Bash pitfall number one. Use for i in *.pdf; …. (2) Quote right. (3) Do not use data as a format in printf; the format gets interpreted. Use printf '%s\n' "$i" if you really need to. In Bash it's easier to use a here string. Commented Jul 6 at 23:32
  • The sed commands in the simplified snippets are different than the sed commands in the "complex" script (additional \|). Commented Jul 6 at 23:58
  • Thanks @Kamil. I overlooked the difference of \| and when adding it in it also fails. Thanks also for the better coding using printf '%s\' "$i". I have made changes to the solution. For those viewing this in the future, I found no solution to using sed. Instead, I found a solution (below) using perl.
    – Brian
    Commented Jul 7 at 17:44

1 Answer 1

0

I was unable to find a solution using sed; however, I found a solution using Perl. There were two problems I identified. The first was related to my regex. Namely, the second grouping wasn't working because the first group was too greedy. I corrected this by making it less greedy ((.*?)). The second problem was that sed in the context of this problem doesn't recognize a less greedy approach. Accordingly, I used Perl.

IFS=$'\n'
    for i in `ls *.pdf`
        do
            base=`printf "$i" | perl -pe 's/^(.*?)( \([0-9]{1,4}\)|)(\.pdf)$/\1/'`
            count=`printf "$i" | perl -pe 's/^(.*?)( \([0-9]{1,4}\)|)(\.pdf)$/\2/g'`
            pp=`qpdf --show-npages $i`
            printf '%s\n' "$i"
            printf '%s\n' "$base"
            printf '%s\n\n' "$count"
        done

As a one-liner, it looks like this and executes as expected.

IFS=$'\n'; for i in `ls *.pdf`; do base=`printf "$i" | perl -pe 's/(.*?)( \([0-9]{1,4}\)|)(\.pdf)/\1/'`; count=`printf "$i" | perl -pe 's/(.*?)( \([0-9]{1,4}\)|)(\.pdf)/\2/'`; pp=`qpdf --show-npages $i`; printf '%s\n' "$i"; printf '%s\n' "$base"; printf '%s\n\n' "$count"; done

The intent of the original script was to append the filename with the number of pages in the respective PDF. Here that is as a one-liner:

IFS=$'\n'; for i in `ls *.pdf`; do base=`printf "$i" | perl -pe 's/(.*?)( \([0-9]{1,4}\)|)(\.pdf)/\1/'`; count=`printf "$i" | perl -pe 's/(.*?)( \([0-9]{1,4}\)|)(\.pdf)/\2/'`; pp=`qpdf --show-npages $i`; new=$base"_pp_"$pp$count".pdf"; printf '%s\n' "$i"; printf '%s\n\n' "$new"; mv "$i" "$new"; done

Flexible Solution

The below script allows you to preend, append, and specify file path depth of renaming a file to reflect number of pages in PDF.

IFS=$'\n'
    for file in `find ./ -type f -iname "*.pdf" -maxdepth 1`  ## Change or remove "-maxdepth 1" to search desired file path depth
        do 
            path=`printf "$file" | perl -pe 's/^(.*\/)([^\/]*)$/\1/'`
            base=`printf "$file" | perl -pe 's/^(.*\/)([^\/]*?)( \([0-9]{1,4}\)|)(\.[Pp][Dd][Ff])$/\2/'`
            count=`printf "$file" | perl -pe 's/^(.*\/)([^\/]*?)( \([0-9]{1,4}\)|)(\.[Pp][Dd][Ff])$/\3/'`
            pp=`qpdf --show-npages $file`
            pages=`printf '%04d' "$pp"`
            preend=$path"pp_"$pages"_"$base$count".pdf"
            append=$path$base"_pp_"$pages$count".pdf"
            printf '%s\n' "$file"
            printf '%s\n' "$preend"
            printf '%s\n\n' "$append"
            mv "$file" "$preend"   ## Change "$preend" to "$append" to meet objective
       done

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .