If you have a PDF
and DOCX
of the same document and want to check for difference in text, use diff
to compare them. Since we’re using --word-diff
, it doesn’t matter that the two files use wildly different line wrapping.
gs -q -sDEVICE=txtwrite -o- file1.pdf > file1.txt
pandoc -t plain file2.docx > file2.txt
git diff --no-index --word-diff file1.txt file2.txt
Or create a shortcut…
alias pdfcat='gs -q -sDEVICE=txtwrite -o-'
alias doccat='pandoc -t plain'
pdfcat file1.pdf > file1.txt
doccat file2.docx > file2.txt
git diff --no-index --word-diff file1.txt file2.txt