Changing data structure representation at runtime: looking for other examples
Asked Answered
T

1

11

Which programs/algorithms change the representation of their data structure at runtime in order to obtain beter performance?

Context: Data structures "define" how real-world concepts are structured and represented in computer memory. For different kinds of computations a different data structure should/can be used to achieve acceptable performance (e.g., linked-list versus array implementation).

Self-adaptive (cf. self-updating) data structures are data structures that change their internal state according to a concrete usage pattern (e.g., self balancing trees). These changes are internal, i.e., depending the data. Moreover, these changes are anticipated upon by design.

Other algorithms can benefit from an external change of representation. In matrix-multiplication, for instance, it is a well know performance trick to transpose "the second matrix" (such that caches are used more efficiently). This is actually changing the matrix representation from row-major to column major order. Because "A" is not the same as "Transposed(A)", the the second matrix is transposed again after the multiplication to keep the program semantically correct.

A second example is using a linked-list at program start-up to populate "the data structure" and change to an array based implementation once the content of the list becomes "stable".

I am looking for programmers that have similar experiences with other example programs where an external change of representation is performed in their application in order to have better performance. Thus, where the representation (chosen implementation) of a data structure is changed at runtime as an explicit part of the program.

Tien answered 3/7, 2014 at 12:40 Comment(6)
I think this is an interesting question, but sadly interesting questions are frowned on at SO these days. Anyway, you might be interested in Judy trees, which have been carefully engineered with this kind of thing in mind.Architectonic
@Architectonic It is a fact that there are guidelines about asking questions, and it is another fact that this question does not follow them. You can agree or disagree with the guidelines, of course - that would be an opinion.Smilax
"[Before] I go in more detail about what I am actually looking for...", you'll have to go into more detail so you are actually asking a question rather than simply outlining who you would like to answer the question you have yet to actually ask. Otherwise, this probably get closed or put on hold until you do.Unconscious
@Unconscious Can you please help me to improve my question?Tien
Do you mean add-on, plug-in like behavior when you refer external change of representation?Caftan
@Peng Zhang I mean with external that the change is part of the program, instead of part of the data structure.Eg., self adjusting data structures change representation because that is what they do. The transposition of the second matrix in matrix-matrix multiplication is part of the program, ie., making it run faster.Tien
S
4

The pattern of transforming the input representation in order to enable a more efficient algorithm comes up in many situations. I would go as far as to say this is an important way to think about designing efficient algorithms in general. Some examples that come to mind:

  • HeapSort. It works by transforming your original input list into a binary heap (probably a min-heap), and then repeatedly calling the remove-min function to get the list elements in sorted order. Asymptotically, it is tied for the fastest comparison-based sorting algorithm.

  • Finding duplicates in a list. Without changing the input list, this will take O(n^2) time. But if you can sort the list, or store the elements in a hash table or Bloom filter, you can find all the duplicates in O(n log n) time or better.

  • Solving a linear program. A linear program (LP) is a certain kind of optimization problem with many applications in economics and elsewhere. One of the most important techniques in solving LPs is duality, which means converting your original LP into what is called the "dual", and then solving the dual. Depending on your situation, solving the dual problem may be much easier than solving the original ("primal") LP. This book chapter starts with a nice example of primal/dual LPs.

  • Multiplying very large integers or polynomials. The fastest known method is using the FFT; see here or here for some nice descriptions. The gist of the idea is to convert from the usual representation of your polynomial (a list of coefficients) to an evaluation basis (a list of evaluations of that polynomial at certain carefully-chosen points). The evaluation basis makes multiplication trivial - you can just multiply each pair of evaluations. Now you have the product polynomial in an evaluation basis, and you interpolate (opposite of evaluation) to get back the coefficients, like you wanted. The Fast Fourier Transform (FFT) is a very efficient way of doing the evaluation and interpolation steps, and the whole thing can be much faster than working with the coefficients directly.

  • Longest common substring. If you want to find the longest substring that appears in a bunch of text documents, one of the fastest ways is to create a suffix tree from each one, then merge them together and find the deepest common node.

  • Linear algebra. Various matrix computations are performed most efficiently by converting your original matrix into a canonical form such as Hermite normal form or computing a QR factorization. These alternate representations of the matrix make standard things such as finding the inverse, determinant, or eigenvalues much faster to compute.

There are certainly many examples besides these, but I was trying to come up with some variety.

Seabrooke answered 9/7, 2014 at 22:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.