Exporting/migrating a Joomla website to Drupal7, with code

Submitted by Danny on Thu, 05/19/2016 - 14:16

I was at DrupalCon 2016 in New Orleans last week and somebody encouraged me to share this info in a blog post, so here it goes.

After building the new website for Kansas Public Radio using Drupal 7, I needed a way to export all of the old articles from our Joomla website to the new website. I could have experimented with the Feeds module for Drupal or another Joomla to Drupal solution, but our installation of Joomla was sooooo old and outdated that I couldn't trust anything. I decided to write my own bit of code.

Inspecting the two separate database architectures for Drupal and Joomla showed that they where very, very different. While Joomla had all the article data in one single table, in Drupal it was spread out between different tables for different fields. Therefor, translating the data from one database to the other would have been very, very difficult. But if I saved the database data of the one Joomla table with all the article data to a CSV file, then I could use PHP and build-in Drupal functions to create the new Drupal nodes from the CSV file.

I was successful, but it wasn't without a fair amount of effort. For example, "sections" and "categories" of content (news article, trivia article, general info article, etc.) in Joomla were designated with a number that had to be mapped to their human readable name. And each content type had to be handled differently, so I wrote a separate script for each content type. They would each parse the same CSV file, and for each row in the file, if the section ID was for the content type I was looking for - (j_sectionID2string($j_sectionID) == 'news') - then it would create a new Drupal node based on the rest of the data in that row.

I also outputed the pairs of old URLs to the new URLs and saved them in a file. Later, I used this file to create a 404 page that would automatically forward the website visitor to the location of the new article, after a period of a few seconds informing them to update their links and bookmarks. 

Here is my script for migrating only our News articles:

<?php
$csvFile = './jos_content.csv';
date_default_timezone_set('America/Chicago');
 
// helper functions
function toLogFile($msg) {
        $log_file = 'output_log';
        exec("echo '" . $msg . "' >> " . $log_file);
        print $msg . '<br /> ';
}
 
// load Drupal
define('DRUPAL_ROOT', '/home3/kpr/public_html');
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
 
// get the csv data
$file = fopen($csvFile, "r");
$i = 0;
while (!feof($file)) {
 
        // do stuff with the variables
        $vars = fgetcsv($file);
 
        $j_id = $vars[0];
        $j_title = $vars[1];
        $j_alias = $vars[2];
        $j_titleAlias = $vars[3];
        $j_introText = $vars[4];
        $j_fullText = $vars[5];
        $j_state = $vars[6];
        $j_sectionID = $vars[7];
        $j_mask = $vars[8];
        $j_catID = $vars[9];
        $j_created = $vars[10];
        $j_createdBy = $vars[11];
        $j_createdByAlias = $vars[10];
        $j_metadata = $vars[25];
        $j_mp3_1 = $vars[32];
        $j_mp3_2 = $vars[33];
 
        if ((j_sectionID2string($j_sectionID) == 'news') && ($j_state == 1))
                {
                if (j_categoryID2string($j_catID) == 'kpr news')
                        {
                        $i++;
                        if ($i < 2501) {continue;}
                        //if ($i > 2500) {fclose($file); exit();}
 
                        // new music blog entry
                        toLogFile("==================================");
                        toLogFile(date('d F Y h:i:s A'));
                        toLogFile("importing ".j_categoryID2string($j_catID).": " . $j_title);
                        //toLogFile($i);
 
                        // calculate some drupal things
 
                        //$d_termID = taxonomy_get_term_by_name("featured", '')->tid;
                        $d_termID = 19; // local
 
                        $d_body = $j_introText . $j_fullText;
                        // fix the stupid images path
                        $d_body = str_replace('src="images/', '" src="/images/', $d_body);
                        $d_body = str_replace('href="images/', 'href="/images/', $d_body);
                        $d_body = str_replace('float: left', '', $d_body);
                        // text is really jacked up...
                        $d_body = str_replace('"=""', '', $d_body);
                        $d_body = str_replace('&lt;', '<', $d_body);
                        $d_body = str_replace('&gt;', '>', $d_body);
                        $d_body = str_replace('<hr />', '', $d_body);
 
                        if (($j_metadata != "" ) && ($j_metadata != NULL)) {
                                $d_summary = strip_tags($j_metadata);
                        } else {
                                $d_summary = str_replace('<img', '<span', $d_body);
                        }
                        //toLogFile($d_summary);
 
                        $d_mp3_1 = "<a href="<a href="http://129.237.213.244:8000/mp3/"">http://129.237.213.244:8000/mp3/"</a>">http://129.237.213.244:8000/mp3/"">http://129.237.213.244:8000/mp3/"</a></a> . $j_mp3_1;
                        $d_mp3_2 = "<a href="<a href="http://129.237.213.244:8000/mp3/"">http://129.237.213.244:8000/mp3/"</a>">http://129.237.213.244:8000/mp3/"">http://129.237.213.244:8000/mp3/"</a></a> . $j_mp3_2;
 
                        $d_userID = 1; // kpr
 
 
                        $node = new stdClass();
                        $node->title = $j_title;
                        $node->type = "article";
                        node_object_prepare($node); // Sets some defaults. Invokes hook_prepare() and hook_node_prepare().
                        $node->language = LANGUAGE_NONE; // Or e.g. 'en' if locale is enabled
                        $node->uid = $d_userID;
                        $node->status = 1; //(1 or 0): published or not
                        $node->promote = 0; //(1 or 0): promoted to front page
                        $node->comment = 0; // 0 for off
 
                        $node->body[LANGUAGE_NONE][0]['value'] = $d_body;
                        $node->body[LANGUAGE_NONE][0]['summary'] = $d_summary;
                        $node->body[LANGUAGE_NONE][0]['format'] = 'full_html';
 
                        // Term reference (taxonomy) field
                        $node->field_news_section[LANGUAGE_NONE][0]['tid'] = $d_termID;
 
                        // audio
                        $node->npr_audio[LANGUAGE_NONE][0]['mp3'] = $d_mp3_1;
                        $node->npr_audio[LANGUAGE_NONE][1]['mp3'] = $d_mp3_2;
 
                        // 'node' is default,
                        // Other possible values are "user" and  "taxonomy_term"
                        $node = node_submit($node); // Prepare node for saving
                        $node->created = strtotime($j_created);
                        node_save($node);
 
                        toLogFile("URL: /news/".$j_id."-".$j_alias." -> /node/".$node->nid);
 
                }
 
        }
}
 
fclose($file);
 
function j_sectionID2string($i)
        {
        switch ($i)
                {
        case 1:
                return "news";
                break;
 
        case 2:
                return "kpr administration";
                break;
 
        case 3:
                return "kpr general info";
                break;
 
        case 6:
                return "programs";
                break;
 
        case 8:
                return "music";
                break;
 
        case 10:
                return "live studio";
                break;
 
        case 11:
                return "kpr webpages";
                break;
 
        case 12:
                return "support";
                break;
 
        case 13:
                return "kpr sidecar";
                break;
 
        case 14:
                return "health";
                break;
 
        case 15:
                return "alerts";
                break;
 
        case 16:
                return "latest";
                break;
 
        default:
                return "who knows";
                break;
        }
}
 
function j_categoryID2string($i)
        {
        switch ($i)
                {
        case 43:
                return "trivia";
                break;
 
        case 3:
                return "photo of the week";
                break;
 
        case 59:
                return "slideshow";
                break;
 
        case 41:
                return "classical live";
                break;
 
        case 49:
                return "jazz live";
                break;
 
        case 51:
                return "kpr live";
                break;
 
        case 54:
                return "rch shows";
                break;
 
        case 50:
                return "trail mix live";
                break;
 
        case 25:
                return "classical";
                break;
 
        case 26:
                return "jazz";
                break;
 
        case 27:
                return "retro cocktail hour";
                break;
 
        case 28:
                return "trail mix";
                break;
 
        case 39:
                return "kpr live studio";
                break;
 
        case 66:
                return "music notes";
                break;
 
        case 47:
                return "health series";
                break;
 
        case 22:
                return "kpr news";
                break;
 
        case 23:
                return "kpr presents";
                break;
 
        case 24:
                return "commentaries";
                break;
 
        case 34:
                return "statehouse news";
                break;
 
        case 76:
                return "health news";
                break;
        }
}
 
?>

 

Your millage will very, a lot, but I hope this helps somebody out there!!

Attachment Size
import.zip9.23 MB 9.23 MB