(Solved) – Why does examining the XML encoding of Excel cells damage the file?

(solved)-–-why-does-examining-the-xml-encoding-of-excel-cells-damage-the-file?

I am repeatedly calling the sub below in VBA to find numbered lines of text in Excel cells. The first step is to examine the XML encoding in order to remove sections of text that have strikethrough font formatting. This appears to work well, but I suspect it is also irreparably damaging the spreadsheet file. I say this because after all the code I’ve written finishes amassing results in an adjacent worksheet, I get some strange behaviour:

A. Excel (2016, BTW) quits unexpectedly when I try to move or copy the results to another file.
B. If I save the file and try to re-open it, Excel says the file needs to be repaired and then removes the contents of all cells in all sheets (typical error message below).


-
error070480_01.xml
Errors were detected in file 'U:Test.xlsm'
-
Removed Part: /xl/sharedStrings.xml part with XML error. (Strings) Unexpected end of input. Line 1147, column 75807.

-
Removed Records: Cell information from /xl/worksheets/sheet1.xml part
Removed Records: Cell information from /xl/worksheets/sheet2.xml part
Removed Records: Cell information from /xl/worksheets/sheet3.xml part
Removed Records: Cell information from /xl/worksheets/sheet4.xml part
Removed Records: Cell information from /xl/worksheets/sheet5.xml part
Removed Records: Cell information from /xl/worksheets/sheet6.xml part
Removed Records: Cell information from /xl/worksheets/sheet7.xml part
Removed Records: Cell information from /xl/worksheets/sheet8.xml part
Removed Records: Cell information from /xl/worksheets/sheet9.xml part
Removed Records: Cell information from /xl/worksheets/sheet10.xml part


I suspect this is the result of memory mismanagement in my code, but I cannot see anything dubious. I’m releasing all the object memory after use. Can anyone else?

This is very similar to another post I made in which the basic process of using XML was first suggested to me, but was fixing a different error message about deleting child nodes.

I apologise for the formatting of code in this post. The colouration of keywords and comments is incorrect. I can’t figure out how to fix this.

Public Sub ParseCell4Items(TargetCell As Excel.Range, ItemsInCell() As String, ItemCount As Integer)

'This sub parses a cell for numbered lines of text, removing text with strikethrough first.
'Results are stored in the array ItemsInCell() and ItemCount.

Dim XMLDocObj As MSXML2.DOMDocument60 'Reference to an object to hold the XML content of TargetCell.
Dim DataNode As MSXML2.IXMLDOMNode 'Reference for the node in XMLDocObj containing all the cell (string) data.
Dim TempNode As MSXML2.IXMLDOMNode 'Temporary reference for any node with strikethrough formatting.

On Error GoTo ErrorHandler

'Free any memory already used by the dynamic array ItemsInCell (from previous calls to this sub).
Erase ItemsInCell
ItemCount = 0

If Not IsEmpty(TargetCell.Value) Then

    'Working with a copy of the partially cleaned cell text in memory, now delete all remaining text with strikethrough formatting.

    Set XMLDocObj = New MSXML2.DOMDocument60

    'Add some namespaces.
    XMLDocObj.SetProperty "SelectionNamespaces", "xmlns:ss='urn:schemas-microsoft-com:office:spreadsheet' " & _
        "xmlns:ht='http://www.w3.org/TR/REC-html40'"

    'Load the cell data as XML into XMLDocObj.
    If XMLDocObj.LoadXML(TargetCell.Value(xlRangeValueXMLSpreadsheet)) Then

        Set DataNode = XMLDocObj.SelectSingleNode("//ss:Data") 'Cell content.

        If Not DataNode Is Nothing Then

            Set TempNode = DataNode.SelectSingleNode("//ht:S") 'Struck through cell content.

            'Prune nodes with strikethrough content from the XML tree.
            Do While Not TempNode Is Nothing
                TempNode.ParentNode.RemoveChild TempNode
                Set TempNode = DataNode.SelectSingleNode("//ht:S")
            Loop

            'Now find the item numbers in the fully cleaned text using a regular expression search.

            CleanedCellText = XMLDocObj.Text

            'Not shown - parse CleanedCellText for numbered items.
            '...


        End If

        Set DataNode = Nothing
        Set TempNode = Nothing

    End If

    Set XMLDocObj = Nothing

End If

Exit Sub

ErrorHandler:

    Call RaiseError(Err.Number, Err.Source, "PHR_ItemExtractionRoutines.ParseCellForItems()", Err.Description, Erl)

End Sub

Leave a Reply

Your email address will not be published. Required fields are marked *