Importing a pretty large D6 site into Pantheon (detailed instructions, with a little help from my friends)

More and more of my clients are using Pantheon to host their Drupal based web applications. This is not an ad, it's just a fact. I'm finding more and more of my development work involves cloning Pantheon based workflow instances and coding and site building within that workflow, and I've seen how it has improved greatly over the years. Now I had to import a quite large Drupal 6 site for a client hoping for a trouble-free Drupal oriented hosting experience while we got on with the site renovation project. While the process was straightforward, and the necessary documentation is there (see References), I thought I'd share my experience as warm and fuzzies for others having to do the same:

Regular import

From your regular Pantheon dashboard (initial login page after registering for an account) you simply click on the Add a site link and provide a name and click on the Create Site link. In a little while you are offered the choice of Start from scratch and Import manually radio buttons. Starting from scratch offers Drupal 6, Drupal 7 or a host of Distribution choices that allow you to start up an off-the-shelf solution via installation profile.

Selecting the latter offers a variety of alternatives for manual import. In the old days of Pantheon, one would just upload a tarball with database.sql in the Drupal document root. But things are much more organized now. The manual upload is divided into Code, Database and Files archives, each of which should be tarrred/gzipped or zipped into its own separate file. Also, for each there are URL (default) and File upload options. It says “Archives must be in tar/gz or zip format. Uploads are limited to 100MB in size. Imports via url are limited to 500MB.”

Now, the URL method, rather than uploading from you laptop, is much better because it's a server to server file transfer, with no dependency on the browser window connection, which may time out, etc.. So how do I provide that? Very simple, just create your three code, database and files folder tarred or zipped files and stick them into the default document root of your VPS or even shared hosting (a secure http over ssl (“https”) URL would provide the best security). Once your site is created on Pantheon, you can quickly delete or move these files from your VPS or shared hosting.

I created my three archives to import one site that did not exceed these limits in the following manner (following References 1):

Creating the code archive (after changing directory on the command line into the Drupal document root, and taking care to exclude .git and the files directory – note the ending dot signifying the current directory):

mysite@myserver:~/mysite7-legacy$ tar czvf /var/www/4pantheon/mysite_code.tgz –exclude=sites/default/files* --exclude=.git* .

Creating the database archive (from the Drupal document root and using drush although you can use mysqldump of course):

mysite@myserver:~/mysite7-legacy$ drush sql-dump | gzip > /var/www/4pantheon/mysite.sql.gz

Creating the files archive (from the files directory itself – note the ending dot):

mysite@myserver:~/mysite7-legacy$ tar czvf /var/www/4pantheon/mysite_files.tgz .

So I ended up with the three files exposed in a web document root as URL's:

  • http://example.com/4pantheon/mysite_code.tgz
  • http://example.com/4pantheon/mysite.sql.gz
  • http://example.com/4pantheon/ mysite_files.tgz

I then entered these URL's into the import site manually form fields with URL option selected (default), and hit the red Import Site button.

If the database is close to the 500 MB limit, that means it is actually several GB in size untarred or unzipped. So it could be quite a few minutes of one server talking to the other and then Pantheon unzipping and stuffing the sql into the database.

Now, you can pack quite a few GB of database into a zip or gzip file, and clearing cache (or even truncating cache tables) prior to creating the file will significantly reduce its size also. Not so much for GB of files folder assets, however. Anyway, the good news is that you can create the site with just the codebase and then once you obtain its ssh credentials, you can use alternative methods for database and files tarball uploads of unlimited size.

I'm going to repeat that:

The good news is that you can create the site with just the codebase and then once you obtain its ssh credentials, you can use alternative methods for database and files tarballs of unlimited size.

Here's how it's done.

Highly irregular import

Now for the fun part. What if my database file is bigger than 500 MB, even zipped or g'zipped? What if my files folder is GB's in size and of course zipping doesn't really help anyway? Let's see about a fun way to take care of the files folder first.

Files

Turns out you can just omit the files folder by leaving the Import manually Files archive field blank althogether. Then, once the site has been created, we can Sftp or rsync the files in directly.

rsync is really cool. It's one of those really flexible command-line Linux utilities that just works, and saves an enormous amount of time and bandwidth too.

Based on Reference 2, the well documented support doc rsync and SFTP, here's what I did to upload almost 3GB of user files to my new Pantheon site with rsync:

  • Added my public key from my Pantheon dashboard

    • Click Add key button

    • Paste in public key

    • Click Add Key button

  • I then went to my site dashboard by clicking on my site home page image and clicked on Connection info and obtained the following info:

Git

SSH clone URL:

ssh://codeserver.dev.n1nn1111-1n1n-n11n-1n11n1n11111@codeserver.dev.n1nn1111-1n1n-n11n-1n11n1n11111.drush.in:2222/~/repository.git xfrmlegacy

Database

Command Line

mysql -u pantheon -pverylongpantheonpassword -h dbserver.dev.n1nn1111-1n1n-n11n-1n11n1n11111.drush.in -P 12801 pantheon
Host: 
dbserver.dev.n1nn1111-1n1n-n11n-1n11n1n11111.drush.in
Username:
pantheon
Password:
verylongpantheonpassword
Port:
12801
DB Name:
pantheon

SFTP

Command Line
sftp -o Port=2222 dev.n1nn1111-1n1n-n11n-1n11n1n11111@appserver.dev.n1nn1111-1n1n-n11n-1n11n1n11111.drush.in
Host:
appserver.dev.n1nn1111-1n1n-n11n-1n11n1n11111.drush.in
Username:
dev.n1nn1111-1n1n-n11n-1n11n1n11111
Password:
Use your dashboard password
Port:
2222

So, I grabbed this info and “kept it in my records”.

Then I simply changed directories into the parent directory containing my files directory on the copy of the drupal site running on my server and shunted my files over to my new Pantheon site via rsync with the following commands:

mysite@myserver:~/mysite7-legacy$ export ENV=dev
mysite@myserver:~/mysite7-legacy$ export SITE=n1nn1111-1n1n-n11n-1n11n1n11111
mysite@myserver:~/mysite7-legacy$ rsync -rlvz --size-only --ipv4 --progress -e 'ssh -p 2222' files/*  $ENV.$SITE@appserver.$ENV.$SITE.drush.in:files/
The authenticity of host '[appserver.dev.n1nn1111-1n1n-n11n-1n11n1n11111.drush.in]:2222 ([166.78.242.215]:2222)' can't be established.
RSA key fingerprint is b5:ea:23:eb:7b:7b:0d:17:c7:13:47:92:ea:70:c1:b5.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[appserver.dev.n1nn1111-1n1n-n11n-1n11n1n11111.drush.in]:2222,[166.78.242.215]:2222' (RSA) to the list of known hosts.
sending incremental file list
... 

A while later, all my files (various GB!) placed in ./sites/default/files on Pantheon! Cool. Yes.

Database

Turns out you can just leave the Import manually Database archive field blank also. Then, once the site is created you can use best practices remote database tools to deploy a database of any size. In my case I just shaved the database down by truncating cache tables, etc., so it fit in the less than 500 MB size limit as a gzip'd file.

See Reference 3.

A little help from my friends

Whenever you hit Support and raise a ticket on Pantheon, you get a response really quickly, like in a few minutes. Just sayin'. So I did all this with more than a little help from my friends.

Once example was that the legacy Drupal 6 site had its files directory, not in ./sites/default/files, but in a ./files directory just off the Drupal document root. Support clued me in, in just a few minutes (See Reference 4):

“If you are importing a site which has files in another location (e.g. "/files") you will need to move the files into the standard location, and add, commit and push a symlink from that location to the new location via git:

$ ln -s ./sites/default/files ./files
$ git add files
$ git commit files -m "adding legacy files location symlink"
$ git push origin master

Your legacy file paths should now work, and your fies will be stored in our cloud files location!”

I was told to be sure to make sure it a relative symlink like the example and not an absolute system path.

References

  1. Importing an existing Drupal site to Pantheon

      http://helpdesk.getpantheon.com/customer/portal/articles/361251

  2. rsync and SFTP

      http://helpdesk.getpantheon.com/customer/portal/articles/373311

  3. Accessing MySQL databases

      http://helpdesk.getpantheon.com/customer/portal/articles/373319-accessing-mysql

  4. Non-standard files locations

      http://helpdesk.getpantheon.com/customer/portal/articles/384917-non-stan...

  5. Get Pantheon

  6. Hire us to do this and other stuff for you

  7. Even better, hire us to mentor you on how to do it and other stuff yourself