Data migrations are the most funny part of any projects. They look easier when migrating Salesforce to Salesforce with about the same structure, but it still a lot of fun. Especially when you want to migrate just subset of data.
Data Loader, DataLoader.io, Talend, Pentaho, JitterBit – all the ETL tools, you name them. You might be able to also put some backup solutions into the equation as they might be able to restore data to other instance.
I had this client, who decided to split their instance into two. All data keep in instance A, some of them move to instance B. Including files, just to have extra fun.
Texei plug-in
I might call myself heavy Salesforce DX user, even though I don’t really use it much compared to some developers. But I’m really happy with the command interface and ability to rerun any command later on.
Luckily, just it time, I heard Fabien speaking about their new DX plugin, which solves extracting and loading data. Exactly what I needed, because I don’t have Talend/Pentaho/JitterBit and doing export/import with Data Loader is a bit of pain, even when you use External Keys.
Plugin installation was super easy (sfdx plugins:install texei-sfdx-plugin), putting together the data plan as well (sfdx texei:data:plan:generate –objects Account,Contact,MyCustomObject__c –outputdir ./data). For each object I just wrote the WHERE formula to select the right one (we created extra checkbox on some objects to mark the records we want to move and we moved all the related to them as well), I started to exclude fields we don’t have in the target instance, but found out it is easier to create a new profile and remove access to them from my user. Being able to setup the mapping in the plugin would be awesome but at the same time extra complication.
sfdx texei:data:export –dataplan ./data/dataplan.json –outputdir ./data and a few minutes of computer time later I had the whole export done including all the relations, sfdx texei:data:import –inputdir ./data and they were in the target instance. Nice, very nice.
Files
From the beginning I had a feeling that file migration will be pain. The data structure for them is pretty complex (ContentVersion, ContentDocumentLink, ContentDocument) plus as I found out in the Doug’s article:
As an admin with “Modify All Data” you can export any classic Note record simply using Data Loader. However, with ContentNote and ContentVersion you can only export the records that are explicitly shared with you or that you own.
This thing confused me from the beginning, that I somehow don’t see enough data in the export and that it is somehow hard to download them.
I did a few things:
- create a new custom field (guid__c) on the ContentVersion object, to save a link to record to which the file is linked (I have just 1:1 links in this instance);
- run a code, which puts the related record id into this field, so I don’t have to worry about all those mapping later on and can simplify the post-processing (see below);
- repeat for each object I need to export and has files linked;
- export data from Salesforce.
// get all accounts linked to exported contracts
List<Account> accs = [SELECT Id FROM Account WHERE Id IN (SELECT AccountId FROM Contract WHERE Export__c = TRUE)];
Map<Id, Account> accMap = new Map<Id, Account>();
for( Account a : accs){
accMap.put( a.Id, a);
}
// get all links between account and files
List<ContentDocumentLink> links = [SELECT Id, LinkedEntityId, ContentDocumentId FROM ContentDocumentLink WHERE LinkedEntityId IN :accMap.keySet()];
Map<Id, Id> mapLinks = new Map<Id, Id>();
for( ContentDocumentLink l : links){
mapLinks.put( l.ContentDocumentId, l.LinkedEntityId);
}
// get all latest versions of files I need
List<ContentVersion> doc = [SELECT Id, Title, ContentDocumentId FROM ContentVersion WHERE ContentDocumentId IN :mapLinks.keySet() AND IsLatest=TRUE];
List<ContentVersion> docs = new List<ContentVersion>();
for( ContentVersion d : doc){
// update their guid__c field with account's id
d.guid__c = accMap.get( mapLinks.get( d.ContentDocumentId)).Id;
docs.add( d);
}
update docs;
Data export is easy, you go into setup, Data Export, check all the right checkboxes and wait.
Guess what – when I did the export for the first time, I forgot to check the Include Salesforce Files and Salesforce CRM Content document versions checkbox, because I felt that the latest version is saved in the ContentVersion object. I was wrong and had to wait 48 hours before Salesforce would allow me to export it one more time.
A few hours later I got an email, that export was successful and I can download all the data – about 30GB of data split into 512MB chunks. I had to download and unzip all of them, even though at the end I needed about 2GB of them, but there is no way to extract just their subset (recalled the other client we had a few months ago, where they had about 500GB of files in the instance and wanted to migrate about 60GB of them and weren’t willing to provide us with the whole set).
Import of the data was pretty easy after that. In the ContentVersion file filtered only those with guid__c filled, I mapped the old record ids to my new one (as I saved them during migration into extra field), I use R for this operation as it is probably quicker than doing VLOOKUPs in Excel. Plus I had to update the file with full path to the original files on my harddrive.
At the end I got a CSV file with columns as request in the related knowledge article – Title, Description, VersionData (full path to the file on your drive), PathOnClient (don’t ask, it is also full path), FirstPublishLocationId (the id of record to which I want to link it). I set the batch size of Data Loader to 1 and uploaded everything into the ContentVersion object.
It failed about half way through – not enough memory. So I found how to call the Data Loader with extra memory (update the path as you need):
javaw.exe -Xms1024m -Xmx1256m -Dappdata.dir="C:\Users\mhumpolec002\AppData\Roaming" -jar "C:\Users\mhumpolec002\AppData\Local\salesforce.com\Data Loader\dataloader-40.0.0-uber.jar"
Crossed my fingers and a few hours later all the data were uploaded. Uff, it took a bit longer than originally expected, but I learnt a few things. The most important is about the texei plugin, which I can use to uploaded data to scratch orgs in scale, way better that the started data tree command..