How to speed up the creation of large fields in XML files?

6 views (last 30 days)
I'm trying to create XML files that contain large fields e.g. of the form:
<Geometry>
<Nodes>
<node id="1">-1.0,-0.5, 0.0</node>
<node id="2">-1.0,-0.5, 1.0</node>
<node id="3">-1.0, 0.5, 0.0</node>
<node id="4">-1.0, 0.5, 1.0</node>
<node id="5"> 1.0,-0.5, 0.0</node>
<node id="6"> 1.0,-0.5, 1.0</node>
<node id="7"> 1.0, 0.5, 0.0</node>
<node id="8"> 1.0, 0.5, 1.0</node>
</Nodes>
<Elements>
<hex8 id="1" mat="1">1,5,7,3,2,6,8,4</hex8>
</Elements>
</Geometry>
Except with lots of nodes and elements. I've found the standard XML codes in MATLAB rather poor in performance so I wrote something like the following whereby I loop through the nodes and elements and create the entries 1 by 1:
parent_node = docNode.createElement('Nodes');
parent_node = GEONode.appendChild(parent_node);
n_steps=size(FEB_struct.Geometry.Nodes,1);
for q_n=1:1:n_steps
node_node = docNode.createElement('node'); %create node entry
node_node = parent_node.appendChild(node_node); %add node entry
attr = docNode.createAttribute('id'); %Create id attribute
attr.setNodeValue(num2str(q_n)); %Set id text
node_node.setAttributeNode(attr); %Add id attribute
node_node.appendChild(docNode.createTextNode(sprintf('%6.7e, %6.7e, %6.7e',FEB_struct.Geometry.Nodes(q_n,:)))); %append data text child
end
These codes are for the generation of finite element analysis input files but in some cases the file export takes much longer than the finite element computation itself (often in the order of >30000 nodes and >200000 elements).
Any help in making this more efficient is appreciated. Is it possible to avoid the for loop and "vectorize" this somehow?
Kevin
  2 Comments
Cedric
Cedric on 14 Jun 2014
When I am dealing with an a priori known XML content, I am often writing my own parser and most often writing my own code for exports (as mentioned by J below). By "a priori known", I mean that the structure is well defined, regular, that I know that there is no special/tricky case, no special chars, etc, which is often the case when documents comes out of another computation engine of some sort (and never the case with web pages!).
Now it's worth reading this thread on Yair's blog and in particular some comments/answers at the bottom, e.g.
" @James – DOM is normally used for small XML models; SAX is usually better for large models that can be processed sequentially. There are numerous SAX parsers available online that you can use in Matlab. Perhaps the most widely used open-source XML parser, which includes support fro both SAX and DOM, is Xerces, which is already pre-bundled in Matlab (take a look at the %matlabroot%/java/jarext/ folder), so you can use it in Matlab out-of-the-box. Other well-known XML support packages, namely Xalan and Saxon, are also pre-bundled. "
Kevin Moerman
Kevin Moerman on 15 Jun 2014
Thanks Cedric for your comment. Sounds interesting. It leaves me with the same question one of the people on that blog asked, which is how to implement that SAX parsing approach. Unfortunately the reply to date on that blog was "I don’t have an immediate answer for you, it requires some investigation". I'll have to look around a bit more.
Thanks,
Kevin

Sign in to comment.

Accepted Answer

J
J on 14 Jun 2014
I don't know whether using xml-files is the most efficient way of storing FEA data. But since you use them as input files there might be no other option for you. First of all I don't know of a way to vectorize the creation of xml-elements. You could optimize your current code a bit I think (currently typing on a computer without Matlab so I can't test), for instance replacing num2str by sprintf.
A better approach would be to directly write a text file instead of creating all the xml-elements and use xmlwrite to make the file. Your code could look something like this (again, no Matlab on this pc, so no possiblity to check if it works).
fid = fopen('FEA.xml', 'Wt');
% Print the first line of the xml-file, look in your exiting files how it looks
fprintf(fid, '<Geometry>\n');
fprintf(fid, '<Nodes>\n');
enumeration = (1:1:n_steps).';
fprintf(fid, '<node id="%u">%6.7e, %6.7e, %6.7e</node>\n', [enumeration FEB_struct.Geometry.Nodes].')
fprintf(fid, '</Nodes>\n');
fprintf(fid, '</Geometry>\n');
fclose(fid);
  1 Comment
Kevin Moerman
Kevin Moerman on 15 Jun 2014
Hey J,
Thanks for your answer. I had tried that in the past and this sometimes works faster. Its just a bit more cumbersome for the smaller fields that also occur for my application where I have a lot special entries and attributes to set depending on the type of analysis. So for those the DOM XML parsing is easier than the text file writing approach. But I think I might create a hybrid approach whereby I create most of the XML file structure using XML methods but leave the larger fields out. Then I'll convert this basic XML to a cell containing string entries and add the larger fields based on vectorised (or in a loop) text operations. Then I export the entire cell using text file methods with the extension .xml and that should do the job.
Kevin

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!