Re-sort a vector after a small number of elements have been modified

Asked 15/9, 2014 at 19:31 Answered 11/11, 2014 at 16:30

If we have a vector of size N that was previously sorted, and replace up to M elements with arbitrary values (where M is much smaller than N), is there an easy way to re-sort them at lower cost (i.e. generate a sorting network of reduced depth) than a full sort?

For example if N=10 and M=2 the input might be

10 20 30 40 999 60 70 80 90 -1

Note: the indices of the modified elements are not known (until we compare them with the surrounding elements.)

Here is an example where I know the solution because the input size is small and I was able to find it with a brute-force search:

if N = 5 and M is 1, these would be valid inputs:

0 0 0 0 0     0 0 1 0 0     0 1 0 0 0     0 1 1 1 0     1 0 0 1 1     1 1 1 1 0

0 0 0 0 1     0 0 1 0 1     0 1 0 0 1     0 1 1 1 1     1 0 1 1 1     1 1 1 1 1

0 0 0 1 0     0 0 1 1 0     0 1 0 1 1     1 0 0 0 0     1 1 0 1 1

0 0 0 1 1     0 0 1 1 1     0 1 1 0 1     1 0 0 0 1     1 1 1 0 1

For example the input may be 0 1 1 0 1 if the previously sorted vector was 0 1 1 1 1 and the 4th element was modified, but there is no way to form 0 1 0 1 0 as a valid input, because it differs in at least 2 elements from any sorted vector.

This would be a valid sorting network for re-sorting these inputs:

>--*---*-----*-------->
   |   |     | 
>--*---|-----|-*---*-->
       |     | |   |
>--*---|-*---*-|---*-->
   |   | |     |
>--*---*-|-----*---*-->
         |         |
>--------*---------*-->

We do not care that this network fails to sort some invalid inputs (e.g. 0 1 0 1 0.)

And this network has depth 4, a saving of 1 compared with the general case (a depth of 5 generally necessary to sort a 5-element vector.)

Unfortunately the brute-force approach is not feasible for larger input sizes.

Is there a known method for constructing a network to re-sort a larger vector?

My N values will be in the order of a few hundred, with M not much more than √N.

Linehan answered 15/9, 2014 at 19:31 Comment(9)

I think some of the sorting methods separate the data in portions and perform the test only to those protions. Maybe you can adapt it and reduce the cost of the method by knowing that each bucket can't have more than some amount of elements unsorted. For small M values, you can also find the elements out of sort, sort them apart and then merge them again to the full list. But that approach is not parallelizable. – Rutland 16/9, 2014 at 14:30

@Rutland I think you are on the right track with the first suggestion. I am experimenting with a Shell Sort. – Linehan 16/9, 2014 at 21:44

Some addtional info would be nice: - Will N really be in the range of few hundred? - What kind of performance do you expect from the sort in general? If your problem size is really not that big, a complicated network may not be worth the trouble. - What kind of sorts did you experiment with until now? - With your question I assume you pretty much mean this: "I only change some elements, is there a way for a sort to be much faster than regular if this information is known or that simply performs much faster if most of the elements are already sorted". Am I right about this? – Rutter 9/11, 2014 at 22:43

Also: - How deep is your understanding in the field of sorts and sorting networks? I'm asking so I won't waste your time because my knowledge is rather limited. – Rutter 9/11, 2014 at 22:44

If you are looking to sort nearly sorted data, insertion sort is actually a good choice. Another is a natural merge sort. I would rank shell sort after those two, for nearly sorted data. – Pork 10/11, 2014 at 5:6

@Rutter yes N really is small, but the same sorting network will be used on billions of input vectors. – Linehan 10/11, 2014 at 8:9

I have a few questions about the problem. Are your elements all ints? Do you know the exact value of M? I assume you don't know the positions of the new/out-of-order elements -- is this true? – Kelula 10/11, 2014 at 14:53

@Kelula 1. They are floats. 2. Currently always odd and exactly √N, ranging from 7 to 29, though this may change. 3. Correct. – Linehan 11/11, 2014 at 12:0

I really want to focus on sorting networks, so I have removed the opencl tag. – Linehan 11/11, 2014 at 12:2

Ok, I'm posting this as an answer since the comment restriction on length drives me nuts :)

You should try this out:

implement a simple sequential sort working on local memory (insertion sort or sth. similar). If you don't know how - I can help with that.
have only a single work-item perform the sorting on the chunk of N elements
calculate the maximum size of local memory per work-group (call clGetDeviceInfo with CL_DEVICE_LOCAL_MEM_SIZE) and derive the maximum number of work-items per work-group, because with this approach your number of work-items will most likely be limited by the amount of local memory.

This will probably work rather well I suspect, because:

a simple sort may be perfectly fine, especially since the array is already sorted to a large degree
parallelizing for such a small number of items is not worth the trouble (using local memory however is!)
since you're processing billions of such small arrays, you will achieve a great occupancy even if only single work-items process such arrays

Let me know if you have problems with my ideas.

EDIT 1:

I just realized I used a technique that may be confusing to others: My proposal for using local memory is not for synchronization or using multiple work items for a single input vector/array. I simply use it to get a low read/write memory latency. Since we use rather large chunks of memory I fear that using private memory may cause swapping to slow global memory without us realizing it. This also means you have to allocate local memory for each work-item. Each work-item will access its own chunk of local memory and use it for sorting (exclusively). I'm not sure how good this idea is, but I've read that using too much private memory may cause swapping to global memory and the only way to notice is by looking at the performance (not sure if I'm right about this).

Rutter answered 10/11, 2014 at 12:31 Comment(1)

This is the right answer. I wish I had submitted it first. :) Insertion sort is optimal for small nearly-sorted arrays. Exploit concurrency by sorting multiple arrays at the same time. Win. – Achieve 14/11, 2014 at 21:4

Here is an algorithm which should yield very good sorting networks. Probably not the absolute best network for all input sizes, but hopefully good enough for practical purposes.

store (or have available) pre-computed networks for n < 16
sort the largest 2^k elements with an optimal network. eg: bitonic sort for largest power of 2 less than or equal to n.
for the remaining elements, repeat #2 until m < 16, where m is the number of unsorted elements
use a known optimal network from #1 to sort any remaining elements
merge sort the smallest and second-smallest sub-lists using a merge sorting network
repeat #5 until only one sorted list remains

All of these steps can be done artificially, and the comparisons stored into a master network instead of acting on the data.

It is worth pointing out that the (bitonic) networks from #2 can be run in parallel, and the smaller ones will finish first. This is good, because as they finish, the networks from #5-6 can begin to execute.

Kelula answered 11/11, 2014 at 16:30 Comment(4)

Please clarify what "largest 2^k" means – Linehan 11/11, 2014 at 20:31

Also it looks like what you are describing is a full sort. Where do you take advantage of the fact that only M elements may be changed, given that you do not know their positions in the input? – Linehan 11/11, 2014 at 20:38

I don't think there is a way to optimize a sorting network with a known number of elements out of place without actually knowing their positions in advance. You wont even know if those M elements are order in relation to themselves. The only guaranteed solution would be to efficiently resort the list. At least you know that most comparisons will not result in a swap. – Kelula 11/11, 2014 at 20:55

By 'largest 2^k' I meant 'the largest power of 2 less than n'. My lowercase n is not equal to N except for the first iteration of step 2. – Kelula 11/11, 2014 at 20:57

EDIT 1:

Recommended topics

Hot tags