Convert Rich text field images to Data URI format in VisualForce
A few weeks ago I had a task to convert Visualforce page into Google Document. I used Javascript to integrate Salesforce with Google Drive and the basic scenario was to render some Visualforce page. This page was a mix of some parent object with several rich text fields and many other fields with many child records linked to parent. These child records also has a few rich text fields and this page was totally ugly from UI perspective but extremely useful from business perspective.
When the page was completed and I tested that I can send a some part of HTML markup of VF page to Google Drive and Google transformed it to correct Google Document I was pretty excited. But. I faced an issue of sending salesforce’s rich text field content into Google drive document. When the plain text and formatting is sent with no issue, sending an images of rich text fields was not working at all.
You know that a rich text fields in Salesforce is just a HTML markup. HTML markup usually contains a references to resources outside of the page context and receives this content via network. When you add an image into rich text field from Salesforce UI you have 2 options:
- Upload file into Salesforce from your own computer
- Give SF a link to some web resource
Externally stored images are transferred fine into Google Drive. But images which stored inside Salesforce can’t be exported via link because it’s not publicly available. Actually rich text field images refer to https://*.content.force.com/servlet/rtaImage?. So, if you want to send these images via network you have to transform it to Data URI format. My initial desire was to use JS on VisualForce as it described here: convert image to data uri with javascript. That’s a really simple and clean solution. But real pain point here is the fact that VF page and salesforce images are on different domains and you can’t use it without enabled CORS on image server side. It’s obvious that we have no access to manage headers which salesforce’s image server will return. So, JS solution was not applied here what was really upset.
The another option for such case is an apex server side processing. That approach is not so cool but at least pretty straightforward. The scenario is follow: get rich text field, parse it and find all references to images which stored in salesforce, convert it to Data URI and rebuild a rich text field with these images.
public static String convertRichTextImageLinkToBinary(String richText) {
String imgTagPattern = '<img alt="User-added image" src="https://__sf_org_name__(.+?)content.force.com(.+?)'’;
List<String> dataUriImgs = new List<String>();
Matcher imgMatcher = Pattern.compile(imgTagPattern).matcher(richText);
while (imgMatcher.find()) {
String imageTag = imgMatcher.group();
String imageURL = imageTag.substringBetween(' src="', '"');
String decodedURL = imageURL.unescapeHtml4();
PageReference page = new PageReference(decodedURL);
dataUriImgs.add('<img alt="User-added image" src="data:image/png;base64, ' + EncodingUtil.base64Encode(page.getContent()) + '">');
}
// can cause Regex too complex Exception on large rich text fields
// for (Integer i = 0; i < dataUriImgs.size(); i++) {
// richText = richText.replaceFirst(imgTagPattern, dataUriImgs[i]);
// }
List<String> rTxs = richText.split('<img alt="User-added image" src="https://targetprocess(.+?)content.force.com(.+?)>"');
String result = '';
for (Integer i = 0; i < dataUriImgs.size(); i++) {
result += (rTxs[i] + dataUriImgs[i]);
}
result += rTxs[rTxs.size() – 1];
return result;
}
This code transforms passed rich text field images to Data URI format. You can use it from Visualforce controller for converting your rich text and exposing it on a page.
Note the following:
If you use getContent in a test method, the test method fails. getContent is treated as a callout in API version 34.0 and later.
That’s quote from official documentation.