Reverse detail from Kakelbont MS 1, a fifteenth-century French Psalter. This image is in the public domain. Daniel Paul O'Donnell

Forward to Navigation

Using find and rsync to extract files from a directory and move them

Posted: Jun 04, 2016 13:06;
Last Modified: Jun 04, 2016 13:06

---

The context

I use hypothes.is for annotating PDFs (and websites). This works best, however, if the PDFs are online somewhere.

I use Zotero and Paperpile for citation management. Zotero in particular, stores all the PDFs that I collect via my bibliography locally in a very fragmented directory structure (each entry in the bibliography manager is its own directory, meaning in my case, the PDFs are spread over 7000 sub-directories.

The problem

So what I want to do is the following:

  1. find and extract all downloaded PDFs in my Zotero folders
  2. upload them to a (private) bibliographic server, where I can use hypothes.is to annotate them

The solution

There are a couple of sites showing you how to use find (to find and extract the files) and rsync (to sync them with the remote directory). E.g. here and here

The trouble is that I kept getting errors from them with files not being found. What I did find, however, was a posting that showed how to integrate the ls (i.e. list directory contents) utility into find to using the -exec option. With some minor modification, it then allowed me to use rsync without any problem.

find /source/directory/ -name "*.pdf" -type f -exec rsync -avvz -e ssh {} user@example.com:/home/user/target/ \;

This seemed to do the trick.

Further improvements

One problem is that Zotero PDF renaming rules are a little opaque. e.g. the article

Fruin, Christine, and Fred Rascoe. “Funding Open Access Journal Publishing Article Processing Charges.” College & Research Libraries News 75, no. 5 (May 1, 2014): 240–43.

has a PDF named the following by Zotero:

Fruin and Rascoe - 2014 - Funding open access journal publishing Article pro.pdf

This is very difficult to predict: capitalisation varies, there are spaces, and it uses a truncated title.

I have found a potential solution: Zotfile, which is a zotero plugin and contains some renaming utilities. My only concern is that when I did some testing, it looked like it might have trouble with my author-less files. Also, I might have to keep running it to rename the files which would cause additional troubles.

----  

Back to content

Search my site

Sections

Current teaching

Recent changes to this site

Tags

anglo-saxon studies, caedmon, citation, citation practice, citations, composition, computers, digital humanities, digital pedagogy, exercises, grammar, history, moodle, old english, pedagogy, research, student employees, students, study tips, teaching, tips, tutorials, unessay, universities, university of lethbridge

See all...

Follow me on Twitter

At the dpod blog