Google Photos Migration

ยท 917 words ยท 5 minute read

I recently had to migrate ~1TB out of Google Photos into my self-hosted NAS (Synology) and thought of documenting the many hurddles I went through in case it benefits someone else.

There are a few ways to go about exporting photos from Google Photos. Not many that work for TB-data. The most reliable way is using Google Takeout to export everything. It will generate a set of files (50GB max) that you can download.

Getting the export ๐Ÿ”—

At my internet speed, downloading each 50GB file takes a few minutes and there are quite a few files to download. When you create the Google Takeout job, you can choose to get a set of links via email but those links have a limited number of times they can be used. This means if you can only retry a download so many times after you are out of luck. If that happens, is it safe to to another export and retry that Nth failed file? Will it contain all the same files?

Instead I chose to export to my Google Drive but I had to temporarly get onto Google’s additional storage plan. My NAS already have a cronjob for backing up my Google Drive (there should be many other Open Source alternatives out there) so the exports would eventually land on my NAS. The only odd thing is, Takeout hosts the exports on Drive but do not assign it a path so my cronjob would not pick it up. Easily fixable by searching the export files, Select All, Move into a directory.

Takeout stores metadata into JSON files ๐Ÿ”—

All files will be created with the date of the export, with no geolocation information or any other metadata. This is well-known and there are many tools out there to try merging the JSON into file’s metadata. I went with google-photos-migrate because it adds dates and GPS information, seemed well-maintained, code base was easy enough that I could inspect it.

It uses ExifTool to edit file metadata. I had problems with files containing incorrect MIME type or files being too large. The nice thing about google-photos-migrate is it moves the problematic files to an error directory keeping the same structure. That means by the end of your first run you’ll have a small portion of files you can just retry. Adding -m and -api largefilesupport=1 got a few more files processed.

Corrupted files ๐Ÿ”—

Still I had a few files landing on error and tracked it down to files being corrupted. Those were mostly videos (maybe happens when your phone dies while recording a video?). I used ffmpeg to “copy” the files and let it handle the corruption (which it did in all cases) and passing the clean file to ExifTool, which got me through the remaining files.

mkdir ffmpeged
find . -name '*.mp4' -maxdepth 1 -exec ffmpeg -y -i {} -c copy ffmpeged/{} \;
mv ffmpeged/* .
rm -rf ffmpeged

Working with tens of thousands of photos ๐Ÿ”—

Google Takeout will organize files into albums which map to directories, plus a Photos directory with all your files. It’s impractical to work with that many files so I broke it down into years.

find . -type f -maxdepth 1 | while read -r file; do d=$(date -r "$file" +%Y); mkdir -p "$d"; mv "$file" "$d"; done  

There will directorys for duplicates ๐Ÿ”—

Each directory can have duplicates-N sub-directories (up to N). That’s because Photo’s albums don’t perfectly map into filesystem directories so files with conflicting names are moved to a duplicate directory. This may happen for multiple reasons:

  1. Different files uploaded with the same name: in which case I wante to keep both, renaming the duplicate and moving to the main album/directory.
  2. Duplicate is actually the original of an edited files: in which case I wanted to keep only the edited version
  3. Duplicate is the video part of live photos: in which case I wanted to keep the photo only.

There’s no easy way to tell them apart so I used a mix of gruelling going through them, and running a script that would just create new, non-conflicting names (thanks ChatGPT for doing 99% of the work!).

#!/bin/bash

move_with_rename() {
    local file="$1"
    local destination="$2"
    local file_name="${file##*/}"
    local file_extension="${file_name##*.}"
    local file_base="${file_name%.*}"

    # This will always rename file and add "-0", even if there's no conflict. Change here if you don't want that
    conflict_counter=0
    while [ -e "$destination/$file_base-$conflict_counter.$file_extension" ]; do
        ((conflict_counter++))
    done
    new_file="$destination/$file_base-$conflict_counter.$file_extension"

    mv "$file" "$new_file"
    echo "Moving $file -> $new_file"
}

local source_dir="$1"
local destination_dir="$2"

# Loop through each file and directory in the source directory
find "$source_dir" -type f | while read -r file; do
  move_with_rename "$file" "$destination_dir"
done

Wrapping up ๐Ÿ”—

I’m fairly happy with the results. Storage is so cheap now that my wife and I only shoot 4k 60fps videos now. It’s funny how we get used to the low quality of videos/photos processed through some space-saving algorithm on your favorite cloud service. Those services usually allow to save in original quality, but price gets prohibitive very quickly. This will only get worst as device resolutions increase. The only thing I miss is the search using AI/ML, that was very convinient. I’ll start looking into it next.

I fully expect some corrupted files from the tools used and my manual steps but I’m keeping the Takeout exports in case I find any corrupted files.

Of course I paid for an extra month before I remembered to delete all files from Photos, the Takout exports on Drive, and cancel the Google One plan!