Many months ago I was working on some code that needed to handle small bits of XML. For the sake of expediency I went with using Rails’ Hash.from_xml method. This worked great for quite a while.

Eventually, the size of the XML I was handling got quite a bit bigger. Now, I’m dealing with several megabytes of XML in each file. With this much data Hash.from_xml is painfully slow. Maxing out the CPU for several minutes slow.

So, I spent a few minutes on Friday afternoon and hacked together a class that acts like the data structure that Hash.from_xml returns. However, instead of building the entire data structure up front, it has some Hash and Array methods that call Nokogiri methods on demand. That said, even parsing the entire document is an order of magnitude faster.

              user        system    total       real
Hash.from_xml 663.820000  3.220000  667.040000  (675.076590)
NokoHash      22.410000   0.190000  22.600000   ( 22.771999)

I also threw in some #method_missing magic to allow for method-call style access of the structure instead of just hash/key access.

Warning

This code is only tested as far as my specific needs for this specific system. I have precisely zero confidence that this will work smoothly for you. I just found this to be an interesting bit of code, I hope you do too.

http://gist.github.com/151049

1 Response to “Quick Hack: Nokogiri backed Hash-ish class”

  1. Ted Says:

    late loading is a beautiful thing when it comes to performance and memory constraints. nice job.

Sorry, comments are closed for this article.