Miscellaneous Mac-appropriate locate(1) script of the day — June 9, 2017

Miscellaneous Mac-appropriate locate(1) script of the day

If you’re on a Unix variant of most any sort, this command will find you every file whose name contains foo anywhere on your machine:

find / -name '*foo*'`

It will also run impossibly slowly, because it’s scanning all of the (maybe millions of) files on your machine. A better option is to periodically generate a database of all files, then do something akin to

select distinct filename
from my_filesystem
where name like '%foo%'

As luck would have it, most Linux distributions that I’m familiar with come with a tool called locate(1) that does this for you, which allows you to write

locate foo

and get what you want. It requires periodically running a tool called updatedb(8) to refresh the database. A quick check of a Linux box I have on hand says that updatedb(8) runs once a day. That’s probably fine, but works less well if you’re changing a lot of filenames; then locate(1) will be behind the times. Which is sad.

Macs have it quite a lot better. They are constantly running a tool called Spotlight, which indexes everything all the time. As I understand it, the Spotlight DB is updated whenever you modify anything about a file, including its contents or its name. There’s a vast Spotlight architecture lurking underneath everything you do on a Mac. At the command line, the mdfind(1) tool lets you search for files whose Spotlight metadata matches any number of criteria, and mdls(1) lists the metadata for a specified file. I’m a Spotlight-metadata novice at this point, but here are some good resources:

  1. This is a good set of tips for accessing the full variety of metadata available to you through the md* tools.
  2. The documentation on Uniform Type Identifiers (which are one part of the Spotlight metadata).

I wrote a tiny little script whose job is to invoke mdfind(1) when you’re on a Mac, and invoke locate(1) on any other Unix. This is handy to me, since I copy my ~/bin directory to every Unix machine I use, and would very much like to use the same locate command on every machine. (You should see my ~/.bashrc, he said sexily.) Here it is:

(15:32 -0400) slaniel@laptop:~$ cat ~/bin/locate 
#!/bin/bash
pattern=$1
os_ver=`uname`
if [ ${os_ver} == "Darwin" ]; then
    mdfind "kMDItemFSName=\"*${pattern}*\"c"
else
    /usr/bin/locate -i $@
fi

That mdfind line initially said

mdfind "kMDItemContentType=\"public*\" && kMDItemFSName=\"*${pattern}*\"c"

until I realized that not every file I cared about was in the public domain: public would only catch files of some publicly known type (e.g., QuickTime videos, GIFs, etc.). For instance, here’s what mdls(1) says about a .DOCX file:

kMDItemContentType             = "org.openxmlformats.wordprocessingml.document"
kMDItemContentTypeTree         = (
    "org.openxmlformats.wordprocessingml.document",
    "org.openxmlformats.openxml",
    "public.zip-archive",
    "com.pkware.zip-archive",
    "public.data",
    "public.item",
    "public.archive",
    "public.composite-content",
    "public.content"
)
[…]
kMDItemKind                    = "Microsoft Word document (.docx)"

That last item (kMDItemKind) is interesting. It leads to a quick bit of command-line hackery to find every Microsoft-format file in your filesystem:

(15:53 -0400) slaniel@AKAMAI_laptop:~$ mdfind "kMDItemKind == '*Microsoft*'" | tr '\n' '\0' | xargs -0 mdls | grep kMDItemKind | grep -o '[^"]\+"$' |sed 's#"$##' | sort | uniq -c | sort -nr
    542 Microsoft Word 97 - 2004 document (.doc)
    534 Microsoft Excel Workbook (.xlsx)
    307 Microsoft Word document (.docx)
    145 Microsoft Word 97 - 2004 document
    145 Microsoft Excel 97-2004 Workbook (.xls)
     33 Microsoft Excel template
     29 Microsoft PowerPoint presentation
     28 Microsoft Excel workbook
     28 Microsoft Excel Macro-Enabled Workbook (.xlsm)
     28 Microsoft Excel 97-2004 workbook
     27 Microsoft Word document
     20 Microsoft PowerPoint 97-2004 presentation
     15 Microsoft personal dictionary
      9 Microsoft Word 97 - 2004 template
      8 Microsoft Outlook document
      8 Microsoft Graph preferences
      3 Microsoft Word Macro-Enabled template (.dotm)
      1 Microsoft Word MHTML document (.mht)
      1 Microsoft Word 97 - 2004 template (.dot)
      1 Microsoft PowerPoint toolbar
      1 Microsoft PowerPoint 97 - 2004 Template
      1 Microsoft Outlook signatures
      1 Microsoft Office Theme

There’s a whole world of Spotlight metadata here, of which I’m only skimming the surface. Suffice to say that it’s an extraordinarily powerful tool to have in your kit.