Resolve Microsoft Word List Copy-Paste Issues in AEM RTE
Our client raised an issue stating the list conversion is not happening properly while copy-pasting the text from MS Word desktop App to RTE. When I explored, I found that the RTE component in AEM has mainly two paste modes (wordhtml and plaintext). Plain text mode scraps all the mark-up as the mode name suggests. While “wordhtml” keeps the markups and works well for most of the tags. But when authors copy any list (ordered / unordered) from Microsoft Word document ( Desktop application ) and try to paste in RTE directly, it doesn’t paste it well.
While going through OOTB implementation, found comment in EditToolsPlugin js stating ol & ul tags are not supported at the moment( current release of AEM 6.5.23 and cloud ) and hence as a fallback it creates individual <p> tags with dot (.) and 6 span tags instead of ul or ol. Hence, list is not being currently converted to <ul>, <ol> and it is mentioned clearly in EditToolsPlugin js.

EditToolsPlugin
This leads to inconsistency as well as frustration if authors try to correct it novicely. We can see the extra spans with dots instead of ul/li tags upon copy paste to rte from MsWord in the image below.

copy-paste to RTE
So let’s delve into a solution which mitigates the discussed issue.
Solution
As a part of solution we will customised the JavaScript code of OOTB plugin where we intercept the paste operation to clean the MsWord formatting and change it to html <ul>/<ol> tags along with preserving the other tags like <b>,<i>,<u>, etc if any already applied. Also paste operation can be done either directly in RTE textfield area or via paste plugin icon and hence handling of both scenarios is discussed below.
Step 1: Clientlibs Creation
Create a clientlibs to limit the scope of this custom plugin to RTE only and add it category to extra client properties of the RTE dialog as shown below.

extra clientlibs
Step 2: Custom solution
Now it’s time to override (copy and paste) the OOTB edit plugin from /libs/clientlibs/granite/richtext/core/js/plugins/EditToolsPlugin.js to our clientlibs js (edit-tools-plugin-workaround-for-paste.js) file created in step 1.
Lets search for the afterPaste method and put the below line as displayed in the image.
clipNode.innerHTML = convertWordListsToUL(clipNode.innerHTML);

afterPaste
Lets keep the whole lines of convertWordListsToUL function from this blog in the last of edit-tools-plugin-workaround-for-paste.js which transforms the list from MS word to <ul> / <ol> tags.

ConvertWordListToUL
function convertWordListsToUL(inputHtml) { const tempDiv = document.createElement('div'); tempDiv.innerHTML = inputHtml; function cleanNode(node) { const comments = []; const walker = document.createTreeWalker(node, NodeFilter.SHOW_COMMENT, null); let comment; while (comment = walker.nextNode()) { comments.push(comment); } comments.forEach(comment => comment.parentNode.removeChild(comment)); // Remove o:p tags const oTags = node.querySelectorAll('o\\:p'); oTags.forEach(tag => tag.parentNode.removeChild(tag)); return node; } function getListLevel(style) { if (!style) return 1; const levelMatch = style.match(/level(\d+)/i); return levelMatch ? parseInt(levelMatch[1], 10) : 1; } function getListType(bulletText) { if (!bulletText) return 'ul'; const orderedPatterns = [ /^\d+\.$/, // 1., 2., 3. /^[a-z]\.$/i, // a., b., c. /^[ivx]+\.$/i, // i., ii., iii. /^\(\d+\)$/, // (1), (2) /^\[\d+\]$/, // [1], [2] /^[a-z]\)$/i, // a), b) /^[ivx]+\)$/i, // i), ii) /^\d+\)$/ // 1), 2) ]; for (const pattern of orderedPatterns) { if (pattern.test(bulletText)) { return 'ol'; } } return 'ul'; } const root = document.createElement('div'); const stack = [{ level: 0, element: root }]; Array.from(tempDiv.children).forEach(node => { if (node.className.includes('MsoListParagraph')) { const style = node.getAttribute('style') || ''; const currentLevel = getListLevel(style); const cleanedNode = cleanNode(node.cloneNode(true)); let bulletText = ''; if (cleanedNode.firstElementChild) { bulletText = cleanedNode.firstElementChild.textContent.trim(); cleanedNode.removeChild(cleanedNode.firstElementChild); } const listType = getListType(bulletText); // Pop stack until we find a parent level lower than current while (stack.length > 1 && stack[stack.length - 1].level >= currentLevel) { stack.pop(); } const parent = stack[stack.length - 1].element; let listContainer = null; // Check if parent already has a list of the same type if (parent.lastElementChild && parent.lastElementChild.tagName === listType.toUpperCase() && // Only reuse list if we're at same level as last item (stack[stack.length - 1].level < currentLevel || stack[stack.length - 1].level === currentLevel)) { listContainer = parent.lastElementChild; } // Check for adjacent list of different type else if (parent.lastElementChild && (parent.lastElementChild.tagName === 'OL' || parent.lastElementChild.tagName === 'UL') && parent.lastElementChild.tagName !== listType.toUpperCase()) { listContainer = document.createElement(listType); parent.appendChild(listContainer); } else { listContainer = document.createElement(listType); parent.appendChild(listContainer); } const li = document.createElement('li'); li.innerHTML = cleanedNode.innerHTML; listContainer.appendChild(li); stack.push({ level: currentLevel, element: li }); } else { // Reset stack to root level for non-list items while (stack.length > 1) stack.pop(); const cleanedNode = cleanNode(node.cloneNode(true)); if (cleanedNode.textContent.trim() !== '') { root.appendChild(cleanedNode); } } }); return root.innerHTML; }
Save all the changes to reflect it correctly.
Note: Please find the attached toolsplugin.js and edit-tools-plugin-workaround-for-paste.js js files for reference, however copy paste from the specified path in your AEM instance.
Step 3: Configure “wordhtml” as default paste mode

defaultPasteMode
Step 4: Optionally, we can enable plugin in toolbar itself, by configuring “paste as wordhtml” plugin explicitly as well if required.
Navigate to inline node of text component and enable paste plugin by adding edit#paste-wordhtml

PastePlugin
To do auto transform list tags to ol/li add below code to the same file in showPasteDialog function as shown below
const iframe = document.querySelector('iframe[name^="rte-paste-html"]'); if (iframe && iframe.contentWindow) { const iframeDocument = iframe.contentWindow.document; // Listen for the 'paste' event in the iframe's document iframeDocument.addEventListener('paste', function() { setTimeout(() => { const bodyContent = iframeDocument.querySelector('body'); iframeDocument.body.innerHTML = convertWordListsToUL(bodyContent.innerHTML); }, 100); }); }

pasteDialog
Results
Congratulations! It’s time to verify our effort by copy-pasting some text from MS Word App to RTE.
Here we can clearly see from the image attached below, it is converted to <ol> / <li> with the required bold, italic, under, anchor tags if applicable.

WordListToUL
Conclusion
It is a quick fix and provides flexibility to authors. However, if we opt for this approach don’t forget to keep an eye on the release notes while upgrading service pack or auto upgrade in case of AEM cloud and remove this file once fixed by Adobe.
Hope you will be able to sail through such issues after going through this blog, feel free to customize and use if needed. Thanks!