Algorithm to determine if array contains n...n+m?

Asked 7/10, 2008 at 3:19 Answered 7/10, 2008 at 3:19

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview questions:

Write a method that takes an int array of size m, and returns (True/False) if the array consists of the numbers n...n+m-1, all numbers in that range and only numbers in that range. The array is not guaranteed to be sorted. (For instance, {2,3,4} would return true. {1,3,1} would return false, {1,2,4} would return false.

The problem I had with this one is that my interviewer kept asking me to optimize (faster O(n), less memory, etc), to the point where he claimed you could do it in one pass of the array using a constant amount of memory. Never figured that one out.

Along with your solutions please indicate if they assume that the array contains unique items. Also indicate if your solution assumes the sequence starts at 1. (I've modified the question slightly to allow cases where it goes 2, 3, 4...)

edit: I am now of the opinion that there does not exist a linear in time and constant in space algorithm that handles duplicates. Can anyone verify this?

The duplicate problem boils down to testing to see if the array contains duplicates in O(n) time, O(1) space. If this can be done you can simply test first and if there are no duplicates run the algorithms posted. So can you test for dupes in O(n) time O(1) space?

Ajani answered 7/10, 2008 at 3:19 Comment(14)

Did you really mean an array of size m (not n)? Seems like it from your example. – Weller 7/10, 2008 at 3:33

heres a problem array for the challengers: [1,1,4,4,5]. should = false. summation thinks its fine. – Brutal 7/10, 2008 at 3:56

For the given problem, you could make a case that it could be done in O(1) space, since int array was specified. I have submitted a possible solution in that case. However, for an unbounded input, I don't believe O(1) space is possible. (Though I do think we could do better than O(n) space) – Pyrexia 7/10, 2008 at 8:39

A O(m) (single pass) and O(1) in space solution has no counter-examples -- uniqueSet #177618 – Chara 9/10, 2008 at 16:10

Um, you say that {1,3,1} should return false, but m here is 3, n = 1, all the numbers in the array are in the range 1..3, so I argue that this should return true according to the description of the problem. – Cartouche 10/10, 2008 at 0:42

@austirg: The problem states that true is returned only if the vector contains all of the numbers in the range. In the {1, 3, 1} example the 2 is missing. – Anderson 10/10, 2008 at 3:14

J.F. Sebastian, as you pointed out below, the "uniqueSet" solution is not O(1) space. It's O(m), because it requires m extra bits of storage. – Cindacindee 10/10, 2008 at 5:22

@Derek: It requires additional storage only in Ruby version, where It always works, but in C version it doesn't require any additional storage, but It could fail on some sequences, but I've not seen a counter example yet. – Chara 10/10, 2008 at 13:24

I've added counter-example for "uniqueSet" C version #177618 – Chara 10/10, 2008 at 13:40

This is easy if you assume that factorial is available with the desired performance characteristic. If you want linear time, then you're golden. – Slippy 11/10, 2008 at 23:56

@Marcin: factorial counter-example: [1, 2, 4, 4, 4, 5, 7, 9, 9]. Product (9! = 362880) and sum (45) are the same with [1, 2, 3, 4, 5, 6, 7, 8, 9]. – Chara 19/10, 2008 at 19:22

I've found O(m) time, O(1) space solution (for int m) stackoverflow.com/questions/177118/#311497 – Chara 22/11, 2008 at 16:44

It is not clear from your statement of the problem whether n is the input parameter of of the problem. – Fashoda 6/11, 2009 at 19:27

@AndreyT: no, it's not, it's just used as a way to describe the sequence – Ajani 7/11, 2009 at 1:30

Under the assumption numbers less than one are not allowed and there are no duplicates, there is a simple summation identity for this - the sum of numbers from 1 to m in increments of 1 is (m * (m + 1)) / 2. You can then sum the array and use this identity.

You can find out if there is a dupe under the above guarantees, plus the guarantee no number is above m or less than n (which can be checked in O(N))

The idea in pseudo-code:
0) Start at N = 0
1) Take the N-th element in the list.
2) If it is not in the right place if the list had been sorted, check where it should be.
3) If the place where it should be already has the same number, you have a dupe - RETURN TRUE
4) Otherwise, swap the numbers (to put the first number in the right place).
5) With the number you just swapped with, is it in the right place?
6) If no, go back to step two.
7) Otherwise, start at step one with N = N + 1. If this would be past the end of the list, you have no dupes.

And, yes, that runs in O(N) although it may look like O(N ^ 2)

Note to everyone (stuff collected from comments)

This solution works under the assumption you can modify the array, then uses in-place Radix sort (which achieves O(N) speed).

Other mathy-solutions have been put forth, but I'm not sure any of them have been proved. There are a bunch of sums that might be useful, but most of them run into a blowup in the number of bits required to represent the sum, which will violate the constant extra space guarantee. I also don't know if any of them are capable of producing a distinct number for a given set of numbers. I think a sum of squares might work, which has a known formula to compute it (see Wolfram's)

New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):

So, it has been mentioned to maybe use sum + sum of squares. No one knew if this worked or not, and I realized that it only becomes an issue when (x + y) = (n + m), such as the fact 2 + 2 = 1 + 3. Squares also have this issue thanks to Pythagorean triples (so 3^2 + 4^2 + 25^2 == 5^2 + 7^2 + 24^2, and the sum of squares doesn't work). If we use Fermat's last theorem, we know this can't happen for n^3. But we also don't know if there is no x + y + z = n for this (unless we do and I don't know it). So no guarantee this, too, doesn't break - and if we continue down this path we quickly run out of bits.

In my glee, however, I forgot to note that you can break the sum of squares, but in doing so you create a normal sum that isn't valid. I don't think you can do both, but, as has been noted, we don't have a proof either way.

I must say, finding counterexamples is sometimes a lot easier than proving things! Consider the following sequences, all of which have a sum of 28 and a sum of squares of 140:

[1, 2, 3, 4, 5, 6, 7]
[1, 1, 4, 5, 5, 6, 6] 
[2, 2, 3, 3, 4, 7, 7]

I could not find any such examples of length 6 or less. If you want an example that has the proper min and max values too, try this one of length 8:

[1, 3, 3, 4, 4, 5, 8, 8]

Simpler approach (modifying hazzen's idea):

An integer array of length m contains all the numbers from n to n+m-1 exactly once iff

every array element is between n and n+m-1
there are no duplicates

(Reason: there are only m values in the given integer range, so if the array contains m unique values in this range, it must contain every one of them once)

If you are allowed to modify the array, you can check both in one pass through the list with a modified version of hazzen's algorithm idea (there is no need to do any summation):

For all array indexes i from 0 to m-1 do
1. If array[i] < n or array[i] >= n+m => RETURN FALSE ("value out of range found")
2. Calculate j = array[i] - n (this is the 0-based position of array[i] in a sorted array with values from n to n+m-1)
3. While j is not equal to i
  1. If list[i] is equal to list[j] => RETURN FALSE ("duplicate found")
  2. Swap list[i] with list[j]
  3. Recalculate j = array[i] - n
RETURN TRUE

I'm not sure if the modification of the original array counts against the maximum allowed additional space of O(1), but if it doesn't this should be the solution the original poster wanted.

Donley answered 7/10, 2008 at 3:19 Comment(19)

"I'll leave that as an exercise to the reader unless you really want to know." - I really want to know - seems to me like that's the challenging part about this problem – Ietta 7/10, 2008 at 3:36

As would I. I thought I had a solution at one point, but it didn't hold up. – Ajani 7/10, 2008 at 3:39

You can solve the similar-sounding problem of finding the one non-duplicate in an array of duplicates using XOR... perhaps that's what hazzen was thinking of? – Zoie 7/10, 2008 at 3:41

XOR doesn't work; consider 1..6 which is 21, and the XOR of all digits is 7. But 6 + 6 + 5 + 1 + 1 + 2 also sums to 21 and also has a XOR of 7. – Donley 7/10, 2008 at 4:24

Maybe I'm misunderstanding your algorithm, but (5,3,3,3,1) seems to swap 5 and 1 in steps 1-6, then get 6 for step 7, calculating that there are no duplicates. – Entrap 7/10, 2008 at 4:24

You have to keep track of where you started last time, and start one after that. So you would start at the second number. I'll make the description more clear. – Donley 7/10, 2008 at 4:25

If that's the case, then your algorithm is sorting the array if there are no duplicates, so I have a hard time believing it can do it in O(N) time. – Entrap 7/10, 2008 at 4:26

It is sorting using the ideas behind radix sort, which under these conditions is O(N) time and space, and we already have the O(N) space if we do it in place. – Donley 7/10, 2008 at 4:29

Yeah, after tracing it on a few examples, I think you're right. Under the constraints imposed by the problem, you're always going to know the correct location for each element, so you're going to have to check each element exactly once if there's no dupes, and you'll stop early if there are. – Entrap 7/10, 2008 at 4:33

There's definitely a contradiction somewhere. The problem says "use O(1) space", and you are trickily using O(n) space. I mean, if we are allowed to use O(n) space, there are much simpler ways to check for duplicates. – Waistband 7/10, 2008 at 4:39

I agree, my solution only works if you can modify the array, then I use a trick with radix sort to get O(N) sorting. If you can't modify the array, there may be some math tricks we are missing. – Donley 7/10, 2008 at 4:42

Or it could be that the problem is unsolvable. I make no guarantees about that. :) Your solution doesn't use any extra storage so it could be argued to be O(1) space, but it's not O(1) in the truest sense. However, if you gave this answer in an interview, the interviewer would be a fool not to hire. – Ajani 7/10, 2008 at 4:48

On your first point about the sum of the array. Change your equation to "m * (m + 2*n - 1) / 2" and that works for negative numbers as well. – Seaddon 7/10, 2008 at 8:16

I posted answer. Use XOR with even & odd numbers separately. XOR has accumulation property and also you do not need to think about overflow or underflow of doing summation. – Deraign 7/10, 2008 at 9:1

Summation is needless if we can check for duplicates. In this case n==min(array), (n+m-1)==max(array) will suffice. In other words inplace-bucket-sort + min + max == solution. – Chara 8/10, 2008 at 15:19

I think that there is a problem with sorting the array in one pass, without knowing the n in advance. I have posted a workaround that in my answer. – Schilling 9/10, 2008 at 9:35

In-place radix sort only works if NO KEY IS ALREADY IN THE CORRECT PLACE. It would seem from the example, {2,3,4} that you CANNOT use an in-place radix sort here. – Cartouche 10/10, 2008 at 1:10

@austrig: The in-place sort works. See #177618 – Chara 10/10, 2008 at 14:9

This answer is a total mess. There are a number of different ideas that should be split into separate answers. I don't care if they're good answers, I'm downvoting this until it's tidied up! – Rutty 17/1, 2012 at 3:4

By working with a[i] % a.length instead of a[i] you reduce the problem to needing to determine that you've got the numbers 0 to a.length - 1.

We take this observation for granted and try to check if the array contains [0,m).

Find the first node that's not in its correct position, e.g.

0 1 2 3 7 5 6 8 4 ;     the original dataset (after the renaming we discussed)
        ^
        `---this is position 4 and the 7 shouldn't be here

Swap that number into where it should be. i.e. swap the 7 with the 8:

0 1 2 3 8 5 6 7 4 ; 
        |     `--------- 7 is in the right place.
        `--------------- this is now the 'current' position

Now we repeat this. Looking again at our current position we ask:

"is this the correct number for here?"

If not, we swap it into its correct place.
If it is in the right place, we move right and do this again.

Following this rule again, we get:

0 1 2 3 4 5 6 7 8 ;     4 and 8 were just swapped

This will gradually build up the list correctly from left to right, and each number will be moved at most once, and hence this is O(n).

If there are dupes, we'll notice it as soon is there is an attempt to swap a number backwards in the list.

Rutty answered 7/10, 2008 at 3:19 Comment(2)

In other words the problem [n, n+m) is equivalent to [0, m). – Chara 8/10, 2008 at 16:48

How do you figure out a without passing over the array once? – Electrochemistry 19/8, 2014 at 7:20

Any one-pass algorithm requires Omega(n) bits of storage.

Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs

A A' - yes

B A' - no

since it cannot distinguish the first case from the second.

Q.E.D.

Lissner answered 7/10, 2008 at 3:19 Comment(0)

Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.

Simpler method:

Step 1, figure out if there are any duplicates. I'm not sure if this is possible in O(1) space. Anyway, return false if there are duplicates.

Step 2, iterate through the list, keep track of the lowest and highest items.

Step 3, Does (highest - lowest) equal m ? If so, return true.

Waistband answered 7/10, 2008 at 3:19 Comment(8)

Your solution reminds me: "THEN A MIRACLE OCCURS" "I think you should be more explicit here in step two" (step 1 in your example) cartoon. :) sciencecartoonsplus.com/gallery/math/math07.gif – Chara 8/10, 2008 at 15:38

Step one either requires > O(1) space or takes O(n) time to compute, if keeping track of a sum is technically using > O(1) space, then so is keeping track of a highest and lowest item... – Velum 10/10, 2008 at 2:3

A sum doesn't take more than O(1) space. As problem size grows (in this case the size of the input array) the sum still occupies the same space and is constant. – Instigation 10/10, 2008 at 2:22

A sum does take space. Addition of two n-digit numbers results in (n+1)-digit number. Otherwise we could perform computations with infinite precision using fixed-width number representation. en.wikipedia.org/wiki/… – Chara 10/10, 2008 at 3:38

J.F. Sebastian, two n-digit numbers can still result in an n-digit number. 2+3=5. Nonetheless, dealing with overflow is really a different issue. No one asking a question like this in an interview is going to be worried about overflow, at least not until the basic algorithm is conquered. – Cindacindee 10/10, 2008 at 4:57

@Derek: 2+3 can overflow. It depends on representation. We could represent numbers less then 4 in two bits, but we can't represent all numbers < 6 in two bits, it requires 3-bits. Therefore 2+3=1 if we use 2-bits width unsigned numbers. Substitute 2-bits by 32-bits and you'll get common case. – Chara 10/10, 2008 at 13:59

Yes, it depends on representation. This isn't what the question is looking for, though. Overflow is an issue that affects many algorithms. It is not, however, something that is generally taken into account when measuring runtime complexity. – Cindacindee 10/10, 2008 at 16:27

Regarding sum, I've noticed that adding two numbers with x digits, creates a number with no more than x+1 digits, thus sum increases with log(numbers_to_be_summed), for example 8 numbers, 4 bits each, form 4 groups of couples of 4 bits numbers, each group is summed to 5 bit number, the 4 numbers of of 5 bits, create two groups, which summed to two 6 bits numbers, which are summed to one 7 bit number that is (2^4 * 2^3) thus for practical uses, it can be considered as O(1) (adding 2^64 numbers each of 64 bits will require 128 bits) – Smackdab 18/6, 2009 at 19:4

boolean determineContinuousArray(int *arr, int len)
{
    // Suppose the array is like below:
    //int arr[10] = {7,11,14,9,8,100,12,5,13,6};
    //int len = sizeof(arr)/sizeof(int);

    int n = arr[0];

    int *result = new int[len];
    for(int i=0; i< len; i++)
            result[i] = -1;
    for (int i=0; i < len; i++)
    {
            int cur = arr[i];
            int hold ;
            if ( arr[i] < n){
                    n = arr[i];
            }
            while(true){
                    if ( cur - n >= len){
                            cout << "array index out of range: meaning this is not a valid array" << endl;
                            return false;
                    }
                    else if ( result[cur - n] != cur){
                            hold = result[cur - n];
                            result[cur - n] = cur;
                            if (hold == -1) break;
                            cur = hold;

                    }else{
                            cout << "found duplicate number " << cur << endl;
                            return false;
                    }

            }
    }
    cout << "this is a valid array" << endl;
    for(int j=0 ; j< len; j++)
            cout << result[j] << "," ;
    cout << endl;
    return true;
}

Deictic answered 7/10, 2008 at 3:19 Comment(0)

Hazzen's algorithm implementation in C

#include<stdio.h>

#define swapxor(a,i,j) a[i]^=a[j];a[j]^=a[i];a[i]^=a[j];

int check_ntom(int a[], int n, int m) {
    int i = 0, j = 0;
    for(i = 0; i < m; i++) {
        if(a[i] < n || a[i] >= n+m) return 0;   //invalid entry
        j = a[i] - n;
        while(j != i) {
            if(a[i]==a[j]) return -1;           //bucket already occupied. Dupe.
            swapxor(a, i, j);                   //faster bitwise swap
            j = a[i] - n;
            if(a[i]>=n+m) return 0;             //[NEW] invalid entry
        }
    }
    return 200;                                 //OK
}

int main() {
    int n=5, m=5;
    int a[] = {6, 5, 7, 9, 8};
    int r = check_ntom(a, n, m);
    printf("%d", r);
    return 0;
}

Edit: change made to the code to eliminate illegal memory access.

Housewarming answered 7/10, 2008 at 3:19 Comment(1)

The above code fails for a[] = {6, 5, 7, 9, 10}; With an array out of bounds after it encounters a swap of '9' and '10'. Possibly a problem with the original algorithm too? – Housewarming 26/7, 2010 at 16:19

Assuming you know only the length of the array and you are allowed to modify the array it can be done in O(1) space and O(n) time.

The process has two straightforward steps. 1. "modulo sort" the array. [5,3,2,4] => [4,5,2,3] (O(2n)) 2. Check that each value's neighbor is one higher than itself (modulo) (O(n))

All told you need at most 3 passes through the array.

The modulo sort is the 'tricky' part, but the objective is simple. Take each value in the array and store it at its own address (modulo length). This requires one pass through the array, looping over each location 'evicting' its value by swapping it to its correct location and moving in the value at its destination. If you ever move in a value which is congruent to the value you just evicted, you have a duplicate and can exit early. Worst case, it's O(2n).

The check is a single pass through the array examining each value with it's next highest neighbor. Always O(n).

Combined algorithm is O(n)+O(2n) = O(3n) = O(n)

Pseudocode from my solution:

foreach(values[]) 
  while(values[i] not congruent to i)
    to-be-evicted = values[i]
    evict(values[i])   // swap to its 'proper' location
    if(values[i]%length == to-be-evicted%length)
      return false;  // a 'duplicate' arrived when we evicted that number
  end while
end foreach
foreach(values[])
  if((values[i]+1)%length != values[i+1]%length)
    return false
end foreach

I've included the java code proof of concept below, it's not pretty, but it passes all the unit tests I made for it. I call these a 'StraightArray' because they correspond to the poker hand of a straight (contiguous sequence ignoring suit).

public class StraightArray {    
    static int evict(int[] a, int i) {
        int t = a[i];
        a[i] = a[t%a.length];
        a[t%a.length] = t;
        return t;
    }
    static boolean isStraight(int[] values) {
        for(int i = 0; i < values.length; i++) {
            while(values[i]%values.length != i) {
                int evicted = evict(values, i);
                if(evicted%values.length == values[i]%values.length) {
                    return false;
                }
            }
        }
        for(int i = 0; i < values.length-1; i++) {
            int n = (values[i]%values.length)+1;
            int m = values[(i+1)]%values.length;
            if(n != m) {
                return false;
            }
        }
        return true;
    }
}

Dais answered 7/10, 2008 at 3:19 Comment(2)

What are the advantages compared to in-place bucket sort? See #177618 – Chara 10/10, 2008 at 3:12

With 3 passes it is unnecessary to use the 'modulo' trick. Compute minval and maxval in the first pass, then place each integer k at position (k-minval) in the second pass, checking for clashes as in your original solution. If the condition is met, you'll end with a sorted array. – Schilling 10/10, 2008 at 12:29

Here's a working solution in O(n)

This is using the pseudocode suggested by Hazzen plus some of my own ideas. It works for negative numbers as well and doesn't require any sum-of-the-squares stuff.

function testArray($nums, $n, $m) {
    // check the sum. PHP offers this array_sum() method, but it's
    // trivial to write your own. O(n) here.
    if (array_sum($nums) != ($m * ($m + 2 * $n - 1) / 2)) {
        return false;    // checksum failed.
    }
    for ($i = 0; $i < $m; ++$i) {
        // check if the number is in the proper range
        if ($nums[$i] < $n || $nums[$i] >= $n + $m) {
            return false;  // value out of range.
        }

        while (($shouldBe = $nums[$i] - $n) != $i) {
            if ($nums[$shouldBe] == $nums[$i]) {
                return false;    // duplicate
            }
            $temp = $nums[$i];
            $nums[$i] = $nums[$shouldBe];
            $nums[$shouldBe] = $temp;
        }
    }
    return true;    // huzzah!
}

var_dump(testArray(array(1, 2, 3, 4, 5), 1, 5));  // true
var_dump(testArray(array(5, 4, 3, 2, 1), 1, 5));  // true
var_dump(testArray(array(6, 4, 3, 2, 0), 1, 5));  // false - out of range
var_dump(testArray(array(5, 5, 3, 2, 1), 1, 5));  // false - checksum fail
var_dump(testArray(array(5, 4, 3, 2, 5), 1, 5));  // false - dupe
var_dump(testArray(array(-2, -1, 0, 1, 2), -2, 5)); // true

Seaddon answered 7/10, 2008 at 3:19 Comment(11)

Summation could either overflow or use additional memory. An array could be read-only. – Chara 8/10, 2008 at 16:6

Actually it is O(n+m) in time and O(m) in space. – Chara 8/10, 2008 at 16:37

There's no such thing as O(n+m). The "n" referred to here is not the same "n" in the solution. O(n) means that the amount of time/resources required to solve it is linear to the size of the set. en.wikipedia.org/wiki/Big_o_notation#Orders_of_common_functions – Seaddon 8/10, 2008 at 22:45

...though I will admit that doing the sum at the start isn't actually necessary. I just added it because it'd be a very quick way to tell if the array fails. – Seaddon 8/10, 2008 at 22:49

I've used general estimation of complexity. Consider case:

n' records have keys in range [0, m). In your case n == m, therefore notation O(n+n) does not make sense indeed, but in general case

n` could differ from m, then notation O(n+m) does make sense. – Chara 8/10, 2008 at 23:56

When using Big O notation, you merely describe how the resource usage (time/memory) goes up as your set size increases. Because this solution increases at a linear scale (that is: the length of time required to complete this function is t = k * n, where t is time, k is a constant and n is ... cont.. – Seaddon 9/10, 2008 at 4:43

...and n is the size of the set. If you made the set size 5, an O(n) function might take 2 seconds to complete. If you doubled the set size to 10, it'll take 4 seconds. You don't go into the specifics with Big-O, just describe the relationship. This is O(n). – Seaddon 9/10, 2008 at 4:46

I've always thought that O(n) is defined via limit[n->+inf]( O(n)/n ) <= C, where C is a constant. – Chara 9/10, 2008 at 18:36

Example: "division of two n-bit numbers takes time O(n(m + 1)), where m is the length of the quotient." en.wikipedia.org/wiki/Euclidean_algorithm#Running_time It shows that we can use more then one parameter while using Big O notation. – Chara 10/10, 2008 at 3:25

okay, well in your initial comment, that this is O(n + m), what's n and m? The only variable which has any impact on the running time of this equation is the size of the range. It doesn't matter whether it's 0-100 or 100-200, it'll still run in the same time. – Seaddon 10/10, 2008 at 3:49

@nickf: in your example n==m (size of the array is equal to size of all possible keys space), therefore O(n + m) -> O(n+n) -> O(2*n) -> O(n). – Chara 10/10, 2008 at 14:8

Vote me down if I'm wrong, but I think we can determine if there are duplicates or not using variance. Because we know the mean beforehand (n + (m-1)/2 or something like that) we can just sum up the numbers and square of difference to mean to see if the sum matches the equation (mn + m(m-1)/2) and the variance is (0 + 1 + 4 + ... + (m-1)^2)/m. If the variance doesn't match, it's likely we have a duplicate.

EDIT: variance is supposed to be (0 + 1 + 4 + ... + [(m-1)/2]^2)*2/m, because half of the elements are less than the mean and the other half is greater than the mean.

If there is a duplicate, a term on the above equation will differ from the correct sequence, even if another duplicate completely cancels out the change in mean. So the function returns true only if both sum and variance matches the desrired values, which we can compute beforehand.

Behlau answered 7/10, 2008 at 3:19 Comment(4)

This is effectively that is going on here [#177618. It's that 'likely' in your comment that is bugging me... – Pinnatipartite 7/10, 2008 at 5:44

See my counterexamples above. – Bolshevist 7/10, 2008 at 6:45

See below for explanation (300 chars not enough!) – Wersh 7/10, 2008 at 9:33

See the counterexample under @Skizz's post – Hygienist 15/5, 2012 at 14:44

Awhile back I heard about a very clever sorting algorithm from someone who worked for the phone company. They had to sort a massive number of phone numbers. After going through a bunch of different sort strategies, they finally hit on a very elegant solution: they just created a bit array and treated the offset into the bit array as the phone number. They then swept through their database with a single pass, changing the bit for each number to 1. After that, they swept through the bit array once, spitting out the phone numbers for entries that had the bit set high.

Along those lines, I believe that you can use the data in the array itself as a meta data structure to look for duplicates. Worst case, you could have a separate array, but I'm pretty sure you can use the input array if you don't mind a bit of swapping.

I'm going to leave out the n parameter for time being, b/c that just confuses things - adding in an index offset is pretty easy to do.

Consider:

for i = 0 to m
  if (a[a[i]]==a[i]) return false; // we have a duplicate
  while (a[a[i]] > a[i]) swapArrayIndexes(a[i], i)
  sum = sum + a[i]
next

if sum = (n+m-1)*m return true else return false

This isn't O(n) - probably closer to O(n Log n) - but it does provide for constant space and may provide a different vector of attack for the problem.

If we want O(n), then using an array of bytes and some bit operations will provide the duplication check with an extra n/32 bytes of memory used (assuming 32 bit ints, of course).

EDIT: The above algorithm could be improved further by adding the sum check to the inside of the loop, and check for:

if sum > (n+m-1)*m return false

that way it will fail fast.

Pinnatipartite answered 7/10, 2008 at 3:19 Comment(3)

Phone company had used just a simple bucket sort. – Chara 8/10, 2008 at 16:34

Yup - and my algorithm above does a radix sort (which Hewgill says is O(n)) - so I'd say the above is close to optimal (unless someone can come up with a proof for one of the statistical approaches)... – Pinnatipartite 9/10, 2008 at 4:37

Sum could overflow. After you've done sorting it is sufficient to check max-min == m-1 – Chara 10/10, 2008 at 3:47

There is a simple solution doing it in one pass with O(1) space if the numbers are positive:

int max = arr[0];
int min = arr[0];
for (int i = 0; i < n; i++) {
    int x = abs(arr[i]);
    int y = x % n;
    if (arr[y] < 0)
        return false;
    arr[y] = -arr[y];
    if (x > max)
        max = x;
    else if (x < min)
        min = x;
}
return max - min == n - 1;

Verily answered 7/10, 2008 at 3:19 Comment(0)

Here is a solution in O(N) time and O(1) extra space for finding duplicates :-

public static boolean check_range(int arr[],int n,int m) {

        for(int i=0;i<m;i++) {
            arr[i] = arr[i] - n;
            if(arr[i]>=m)
                return(false);
        }

        System.out.println("In range");

        int j=0;
        while(j<m) {
            System.out.println(j);
            if(arr[j]<m) {

                if(arr[arr[j]]<m) {

                    int t = arr[arr[j]];
                    arr[arr[j]] = arr[j] + m;
                    arr[j] = t;
                    if(j==arr[j]) {

                        arr[j] = arr[j] + m;
                        j++;
                    }

                }

                else return(false);

            }

            else j++;

        }

Explanation:-

Bring number to range (0,m-1) by arr[i] = arr[i] - n if out of range return false.

for each i check if arr[arr[i]] is unoccupied that is it has value less than m

if so swap(arr[i],arr[arr[i]]) and arr[arr[i]] = arr[arr[i]] + m to signal that it is occupied

if arr[j] = j and simply add m and increment j

if arr[arr[j]] >=m means it is occupied hence current value is duplicate hence return false.

if arr[j] >= m then skip

Schlesien answered 7/10, 2008 at 3:19 Comment(0)

So there is an algorithm that takes O(n^2) that does not require modifying the input array and takes constant space.

First, assume that you know n and m. This is a linear operation, so it does not add any additional complexity. Next, assume there exists one element equal to n and one element equal to n+m-1 and all the rest are in [n, n+m). Given that, we can reduce the problem to having an array with elements in [0, m).

Now, since we know that the elements are bounded by the size of the array, we can treat each element as a node with a single link to another element; in other words, the array describes a directed graph. In this directed graph, if there are no duplicate elements, every node belongs to a cycle, that is, a node is reachable from itself in m or less steps. If there is a duplicate element, then there exists one node that is not reachable from itself at all.

So, to detect this, you walk the entire array from start to finish and determine if each element returns to itself in <=m steps. If any element is not reachable in <=m steps, then you have a duplicate and can return false. Otherwise, when you finish visiting all elements, you can return true:

for (int start_index= 0; start_index<m; ++start_index)
{
    int steps= 1;
    int current_element_index= arr[start_index];
    while (steps<m+1 && current_element_index!=start_index)
    {
        current_element_index= arr[current_element_index];
        ++steps;
    }

    if (steps>m)
    {
        return false;
    }
}

return true;

You can optimize this by storing additional information:

Record sum of the length of the cycle from each element, unless the cycle visits an element before that element, call it sum_of_steps.
For every element, only step m-sum_of_steps nodes out. If you don't return to the starting element and you don't visit an element before the starting element, you have found a loop containing duplicate elements and can return false.

This is still O(n^2), e.g. {1, 2, 3, 0, 5, 6, 7, 4}, but it's a little bit faster.

Barmecidal answered 7/10, 2008 at 3:19 Comment(0)

Oops! I got caught up in a duplicate question and did not see the already identical solutions here. And I thought I'd finally done something original! Here is a historical archive of when I was slightly more pleased:

Well, I have no certainty if this algorithm satisfies all conditions. In fact, I haven't even validated that it works beyond a couple test cases I have tried. Even if my algorithm does have problems, hopefully my approach sparks some solutions.

This algorithm, to my knowledge, works in constant memory and scans the array three times. Perhaps an added bonus is that it works for the full range of integers, if that wasn't part of the original problem.

I am not much of a pseudo-code person, and I really think the code might simply make more sense than words. Here is an implementation I wrote in PHP. Take heed of the comments.

function is_permutation($ints) {

  /* Gather some meta-data. These scans can
     be done simultaneously */
  $lowest = min($ints);
  $length = count($ints);

  $max_index = $length - 1;

  $sort_run_count = 0;

  /* I do not have any proof that running this sort twice
     will always completely sort the array (of course only
     intentionally happening if the array is a permutation) */

  while ($sort_run_count < 2) {

    for ($i = 0; $i < $length; ++$i) {

      $dest_index = $ints[$i] - $lowest;

      if ($i == $dest_index) {
        continue;
      }

      if ($dest_index > $max_index) {
        return false;
      }

      if ($ints[$i] == $ints[$dest_index]) {
        return false;
      }

      $temp = $ints[$dest_index];
      $ints[$dest_index] = $ints[$i];
      $ints[$i] = $temp;

    }

    ++$sort_run_count;

  }

  return true;

}

Clientele answered 7/10, 2008 at 3:19 Comment(0)

I like Greg Hewgill's idea of Radix sorting. To find duplicates, you can sort in O(N) time given the constraints on the values in this array.

For an in-place O(1) space O(N) time that restores the original ordering of the list, you don't have to do an actual swap on that number; you can just mark it with a flag:

//Java: assumes all numbers in arr > 1
boolean checkArrayConsecutiveRange(int[] arr) {

// find min/max
int min = arr[0]; int max = arr[0]
for (int i=1; i<arr.length; i++) {
    min = (arr[i] < min ? arr[i] : min);
    max = (arr[i] > max ? arr[i] : max);
}
if (max-min != arr.length) return false;

// flag and check
boolean ret = true;
for (int i=0; i<arr.length; i++) {
    int targetI = Math.abs(arr[i])-min;
    if (arr[targetI] < 0) {
        ret = false; 
        break;
    }
    arr[targetI] = -arr[targetI];
}
for (int i=0; i<arr.length; i++) {
    arr[i] = Math.abs(arr[i]);
}

return ret;
}

Storing the flags inside the given array is kind of cheating, and doesn't play well with parallelization. I'm still trying to think of a way to do it without touching the array in O(N) time and O(log N) space. Checking against the sum and against the sum of least squares (arr[i] - arr.length/2.0)^2 feels like it might work. The one defining characteristic we know about a 0...m array with no duplicates is that it's uniformly distributed; we should just check that.

Now if only I could prove it.

I'd like to note that the solution above involving factorial takes O(N) space to store the factorial itself. N! > 2^N, which takes N bytes to store.

Agree answered 7/10, 2008 at 3:19 Comment(0)

Any contiguous array [ n, n+1, ..., n+m-1 ] can be mapped on to a 'base' interval [ 0, 1, ..., m ] using the modulo operator. For each i in the interval, there is exactly one i%m in the base interval and vice versa.

Any contiguous array also has a 'span' m (maximum - minimum + 1) equal to it's size.

Using these facts, you can create an "encountered" boolean array of same size containing all falses initially, and while visiting the input array, put their related "encountered" elements to true.

This algorithm is O(n) in space, O(n) in time, and checks for duplicates.

def contiguous( values )
    #initialization
    encountered = Array.new( values.size, false )
    min, max = nil, nil
    visited = 0

    values.each do |v|

        index = v % encountered.size

        if( encountered[ index ] )
            return "duplicates"; 
        end

        encountered[ index ] = true
        min = v if min == nil or v < min
        max = v if max == nil or v > max 
        visited += 1
    end

    if ( max - min + 1 != values.size ) or visited != values.size
        return "hole"
    else
        return "contiguous"
    end

end

tests = [ 
[ false, [ 2,4,5,6 ] ], 
[ false, [ 10,11,13,14 ] ] , 
[ true , [ 20,21,22,23 ] ] , 
[ true , [ 19,20,21,22,23 ] ] ,
[ true , [ 20,21,22,23,24 ] ] ,
[ false, [ 20,21,22,23,24+5 ] ] ,
[ false, [ 2,2,3,4,5 ] ]
]

tests.each do |t|
    result = contiguous( t[1] )
    if( t[0] != ( result == "contiguous" ) )
        puts "Failed Test : " + t[1].to_s + " returned " + result
    end
end

Bicarb answered 7/10, 2008 at 3:19 Comment(0)

I propose the following:

Choose a finite set of prime numbers P_1,P_2,...,P_K, and compute the occurrences of the elements in the input sequence (minus the minimum) modulo each P_i. The pattern of a valid sequence is known.

For example for a sequence of 17 elements, modulo 2 we must have the profile: [9 8], modulo 3: [6 6 5], modulo 5: [4 4 3 3 3], etc.

Combining the test using several bases we obtain a more and more precise probabilistic test. Since the entries are bounded by the integer size, there exists a finite base providing an exact test. This is similar to probabilistic pseudo primality tests.

S_i is an int array of size P_i, initially filled with 0, i=1..K
M is the length of the input sequence
Mn = INT_MAX
Mx = INT_MIN

for x in the input sequence:
  for i in 1..K: S_i[x % P_i]++  // count occurrences mod Pi
  Mn = min(Mn,x)  // update min
  Mx = max(Mx,x)  // and max

if Mx-Mn != M-1: return False  // Check bounds

for i in 1..K:
  // Check profile mod P_i
  Q = M / P_i
  R = M % P_i
  Check S_i[(Mn+j) % P_i] is Q+1 for j=0..R-1 and Q for j=R..P_i-1
  if this test fails, return False

return True

Singletary answered 7/10, 2008 at 3:19 Comment(0)

Product of m consecutive numbers is divisible by m! [ m factorial ]

so in one pass you can compute the product of the m numbers, also compute m! and see if the product modulo m ! is zero at the end of the pass

I might be missing something but this is what comes to my mind ...

something like this in python

my_list1 = [9,5,8,7,6]

my_list2 = [3,5,4,7]

def consecutive(my_list):

count = 0
prod = fact = 1
for num in my_list:
    prod *= num
    count +=1 
    fact *= count
if not prod % fact: 
    return 1   
else:   
    return 0

print consecutive(my_list1)

print consecutive(my_list2)

HotPotato ~$ python m_consecutive.py

Peroxidize answered 7/10, 2008 at 3:19 Comment(1)

The title states that if you have m consecutive numbers, then the product is divisible by m!. This does not allow you to conclude that you have consecutive numbers if the product is divisible by m!. [9,5,8,7,6] results in 1, in your example. So does [18,5,8,7,6], but the numbers are not consecutive. – Mongol 15/5, 2009 at 11:6

An array contains N numbers, and you want to determine whether two of the numbers sum to a given number K. For instance, if the input is 8,4, 1,6 and K is 10, the answer is yes (4 and 6). A number may be used twice. Do the following. a. Give an O(N2) algorithm to solve this problem. b. Give an O(N log N) algorithm to solve this problem. (Hint: Sort the items first. After doing so, you can solve the problem in linear time.) c. Code both solutions and compare the running times of your algorithms. 4.

Subdivide answered 7/10, 2008 at 3:19 Comment(0)

The Answer from "nickf" dows not work if the array is unsorted var_dump(testArray(array(5, 3, 1, 2, 4), 1, 5)); //gives "duplicates" !!!!

Also your formula to compute sum([n...n+m-1]) looks incorrect.... the correct formula is (m(m+1)/2 - n(n-1)/2)

Douglasdouglashome answered 7/10, 2008 at 3:19 Comment(1)

I've implemented nickf's method in C. It works fine. I don't know whether his implementation is correct, but at least the method is. – Chara 22/11, 2008 at 17:29

In Python:

def ispermutation(iterable, m, n):
    """Whether iterable and the range [n, n+m) have the same elements.

       pre-condition: there are no duplicates in the iterable
    """ 
    for i, elem in enumerate(iterable):
        if not n <= elem < n+m:
            return False

    return i == m-1

print(ispermutation([1, 42], 2, 1)    == False)
print(ispermutation(range(10), 10, 0) == True)
print(ispermutation((2, 1, 3), 3, 1)  == True)
print(ispermutation((2, 1, 3), 3, 0)  == False)
print(ispermutation((2, 1, 3), 4, 1)  == False)
print(ispermutation((2, 1, 3), 2, 1)  == False)

It is O(m) in time and O(1) in space. It does not take into account duplicates.

Alternate solution:

def ispermutation(iterable, m, n): 
    """Same as above.

    pre-condition: assert(len(list(iterable)) == m)
    """
    return all(n <= elem < n+m for elem in iterable)

Giles answered 7/10, 2008 at 3:19 Comment(0)

A C version of b3's pseudo-code

(to avoid misinterpretation of the pseudo-code)

Counter example: {1, 1, 2, 4, 6, 7, 7}.

int pow_minus_one(int power)
{
  return (power % 2 == 0) ? 1 : -1;
}

int ceil_half(int n)
{
  return n / 2 + (n % 2);
}

bool isperm_b3_3(int m; int a[m], int m, int n)
{
  /**
     O(m) in time (single pass), O(1) in space,
     doesn't use n
     possible overflow in sum
     a[] may be readonly
   */
  int altsum = 0;
  int mina = INT_MAX;
  int maxa = INT_MIN;

  for (int i = 0; i < m; ++i)
    {
      const int v = a[i] - n + 1; // [n, n+m-1] -> [1, m] to deal with n=0
      if (mina > v)
        mina = v;
      if (maxa < v)
        maxa = v;

      altsum += pow_minus_one(v) * v;
    }
  return ((maxa-mina == m-1)
          and ((pow_minus_one(mina + m-1) * ceil_half(mina + m-1)
                - pow_minus_one(mina-1) * ceil_half(mina-1)) == altsum));
}

Giles answered 7/10, 2008 at 3:19 Comment(8)

You need to change (mina + m) to (mina + m - 1). – Anderson 9/10, 2008 at 19:59

I've changed (mina+m) -> (mina+m-1). Now It breaks on {1, 1, 2, 2, } – Chara 9/10, 2008 at 20:49

This line is incorrect and not stated in my algorithm. We can't make the assumption that [n, n+m) -> [1,m]: const int v = a[i] - n + 1; // [n, n+m) -> [1, m] – Anderson 9/10, 2008 at 21:36

@b3: it is not an assumption, e.g. {10, 12, 11} (where n=10, m=3) -> {1, 3, 2} (where n=1, m=3). Any algorithm that meets the task requirements must return the same answers for both ranges. – Chara 9/10, 2008 at 21:50

The task requirements don't state that n is given. The only inputs are the length of the data vector and the data vector itself. The algorithm I presented doesn't allow for n as an input (which, I believe, is a more flexible solution than having to know n beforehand). – Anderson 9/10, 2008 at 21:58

btw, the a[i]-n+1 is noop for {1,1,2,2} i.e., even without v=a[i]-n+1 line the algorithm fails. – Chara 9/10, 2008 at 22:3

You're right. I had removed an additional check of the upper and lowere bounds after the loop that I thought was unnecessary but in fact it's required. See the updated pseudocode: #177618 – Anderson 9/10, 2008 at 22:26

Updated C version to deal with n=0, new counter example {1, 1, 2, 4, 6, 7, 7}. Now the C version is out of sync with pseudo-code – Chara 10/10, 2008 at 0:33

A C version of Kent Fredric's Ruby solution

(to facilitate testing)

Counter-example (for C version): {8, 33, 27, 30, 9, 2, 35, 7, 26, 32, 2, 23, 0, 13, 1, 6, 31, 3, 28, 4, 5, 18, 12, 2, 9, 14, 17, 21, 19, 22, 15, 20, 24, 11, 10, 16, 25}. Here n=0, m=35. This sequence misses 34 and has two 2.

It is an O(m) in time and O(1) in space solution.

Out-of-range values are easily detected in O(n) in time and O(1) in space, therefore tests are concentrated on in-range (means all values are in the valid range [n, n+m)) sequences. Otherwise {1, 34} is a counter example (for C version, sizeof(int)==4, standard binary representation of numbers).

The main difference between C and Ruby version: << operator will rotate values in C due to a finite sizeof(int), but in Ruby numbers will grow to accomodate the result e.g.,

Ruby: 1 << 100 # -> 1267650600228229401496703205376

C: int n = 100; 1 << n // -> 16

In Ruby: check_index ^= 1 << i; is equivalent to check_index.setbit(i). The same effect could be implemented in C++: vector<bool> v(m); v[i] = true;

bool isperm_fredric(int m; int a[m], int m, int n)
{
  /**
     O(m) in time (single pass), O(1) in space,
     no restriction on n,
     ?overflow?
     a[] may be readonly
   */
  int check_index = 0;
  int check_value = 0;

  int min = a[0];
  for (int i = 0; i < m; ++i) {

    check_index ^= 1 << i;
    check_value ^= 1 << (a[i] - n); //

    if (a[i] < min)
      min = a[i];
  }
  check_index <<= min - n; // min and n may differ e.g., 
                           //  {1, 1}: min=1, but n may be 0.
  return check_index == check_value;
}

Values of the above function were tested against the following code:

bool *seen_isperm_trusted  = NULL;
bool isperm_trusted(int m; int a[m], int m, int n)
{
  /** O(m) in time, O(m) in space */

  for (int i = 0; i < m; ++i) // could be memset(s_i_t, 0, m*sizeof(*s_i_t));
    seen_isperm_trusted[i] = false;

  for (int i = 0; i < m; ++i) {

    if (a[i] < n or a[i] >= n + m)
      return false; // out of range

    if (seen_isperm_trusted[a[i]-n])
      return false; // duplicates
    else
      seen_isperm_trusted[a[i]-n] = true;
  }

  return true; // a[] is a permutation of the range: [n, n+m)
}

Input arrays are generated with:

void backtrack(int m; int a[m], int m, int nitems)
{
  /** generate all permutations with repetition for the range [0, m) */
  if (nitems == m) {
    (void)test_array(a, nitems, 0); // {0, 0}, {0, 1}, {1, 0}, {1, 1}
  }
  else for (int i = 0; i < m; ++i) {
      a[nitems] = i;
      backtrack(a, m, nitems + 1);
    }
}

Giles answered 7/10, 2008 at 3:19 Comment(3)

You can't call this a working solution when it breaks as soon a you have sequences longer than 32. Artificially limiting the solution doesn't make it O(1). – Cindacindee 10/10, 2008 at 5:24

@Derek: It works on sequences longer than 32. But for some of them It could return wrong answer, but I've not seen counter-examples yet. For example, it works on {8, 33, 27, 30, 9, 7, 26, 32, 2, 23, 0, 13, 1, 6, 31, 3, 28, 4, 5, 18, 12, 29, 14, 17, 21, 19, 22, 15, 20, 24, 11, 10, 33, 25}. – Chara 10/10, 2008 at 13:21

I've added counter-example for C version. – Chara 10/10, 2008 at 13:37

Counter-example for XOR algorithm.

(can't post it as a comment)

@popopome

For a = {0, 2, 7, 5,} it return true (means that a is a permutation of the range [0, 4) ), but it must return false in this case (a is obviously is not a permutaton of [0, 4) ).

Another counter example: {0, 0, 1, 3, 5, 6, 6} -- all values are in range but there are duplicates.

I could incorrectly implement popopome's idea (or tests), therefore here is the code:

bool isperm_popopome(int m; int a[m], int m, int  n)
{
  /** O(m) in time (single pass), O(1) in space,
      no restrictions on n,
      no overflow,
      a[] may be readonly
  */
  int even_xor = 0;
  int odd_xor  = 0;

  for (int i = 0; i < m; ++i)
    {
      if (a[i] % 2 == 0) // is even
        even_xor ^= a[i];
      else
        odd_xor ^= a[i];

      const int b = i + n;
      if (b % 2 == 0)    // is even
        even_xor ^= b;
      else
        odd_xor ^= b;
    }

  return (even_xor == 0) && (odd_xor == 0);
}

Giles answered 7/10, 2008 at 3:19 Comment(0)

ciphwn has it right. It is all to do with statistics. What the question is asking is, in statistical terms, is whether or not the sequence of numbers form a discrete uniform distribution. A discrete uniform distribution is where all values of a finite set of possible values are equally probable. Fortunately there are some useful formulas to determine if a discrete set is uniform. Firstly, to determine the mean of the set (a..b) is (a+b)/2 and the variance is (n.n-1)/12. Next, determine the variance of the given set:

variance = sum [i=1..n] (f(i)-mean).(f(i)-mean)/n

and then compare with the expected variance. This will require two passes over the data, once to determine the mean and again to calculate the variance.

References:

Wersh answered 7/10, 2008 at 3:19 Comment(3)

This algorithm fails for [1,3,3,4,4,5,8,8] – Hinge 9/10, 2008 at 10:30

No, for that set, the expected (i.e. the set [1,2,3,4,5,6,7,8]) and actual mean are both 4.5 but the variance is 4.67 and 5.25 respectively. – Wersh 21/10, 2008 at 12:24

Oops, my bad there. I note the forumla is wrong in the main post, the variance for a uniform distribution is (n-1)(n+1)/12 and the first comment does indeed bugger it up. – Wersh 21/10, 2008 at 12:35

Given this -

Write a method that takes an int array of size m ...

I suppose it is fair to conclude there is an upper limit for m, equal to the value of the largest int (2^32 being typical). In other words, even though m is not specified as an int, the fact that the array can't have duplicates implies there can't be more than the number of values you can form out of 32 bits, which in turn implies m is limited to be an int also.

If such a conclusion is acceptable, then I propose to use a fixed space of (2^33 + 2) * 4 bytes = 34,359,738,376 bytes = 34.4GB to handle all possible cases. (Not counting the space required by the input array and its loop).

Of course, for optimization, I would first take m into account, and allocate only the actual amount needed, (2m+2) * 4 bytes.

If this is acceptable for the O(1) space constraint - for the stated problem - then let me proceed to an algorithmic proposal... :)

Assumptions: array of m ints, positive or negative, none greater than what 4 bytes can hold. Duplicates are handled. First value can be any valid int. Restrict m as above.

First, create an int array of length 2m-1, ary, and provide three int variables: left, diff, and right. Notice that makes 2m+2...

Second, take the first value from the input array and copy it to position m-1 in the new array. Initialize the three variables.

set ary[m-1] - nthVal // n=0
set left = diff = right = 0

Third, loop through the remaining values in the input array and do the following for each iteration:

set diff = nthVal - ary[m-1]
if (diff > m-1 + right || diff < 1-m + left) return false // out of bounds
if (ary[m-1+diff] != null) return false // duplicate
set ary[m-1+diff] = nthVal
if (diff>left) left = diff // constrains left bound further right
if (diff<right) right = diff // constrains right bound further left

I decided to put this in code, and it worked.

Here is a working sample using C#:

public class Program
{
    static bool puzzle(int[] inAry)
    {
        var m = inAry.Count();
        var outAry = new int?[2 * m - 1];
        int diff = 0;
        int left = 0;
        int right = 0;
        outAry[m - 1] = inAry[0];
        for (var i = 1; i < m; i += 1)
        {
            diff = inAry[i] - inAry[0];
            if (diff > m - 1 + right || diff < 1 - m + left) return false;
            if (outAry[m - 1 + diff] != null) return false;
            outAry[m - 1 + diff] = inAry[i];
            if (diff > left) left = diff;
            if (diff < right) right = diff;
        }
        return true;
    }

    static void Main(string[] args)
    {
        var inAry = new int[3]{ 2, 3, 4 };
        Console.WriteLine(puzzle(inAry));
        inAry = new int[13] { -3, 5, -1, -2, 9, 8, 2, 3, 0, 6, 4, 7, 1 };
        Console.WriteLine(puzzle(inAry));
        inAry = new int[3] { 21, 31, 41 };
        Console.WriteLine(puzzle(inAry));
        Console.ReadLine();
    }

}

Pyrexia answered 7/10, 2008 at 3:19 Comment(2)

Can someone explain why this post might have been voted as not helpful? I posted my assumptions along with my own original algorithm with working code as proof. – Pyrexia 10/10, 2008 at 4:56

Just a side-note: under your assumptions m is not int, but unsigned int. – Chara 22/11, 2008 at 17:8

If you want to know the sum of the numbers [n ... n + m - 1] just use this equation.

var sum = m * (m + 2 * n - 1) / 2;

That works for any number, positive or negative, even if n is a decimal.

Seaddon answered 7/10, 2008 at 3:19 Comment(0)

Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.

O(1) indicates constant space which does not change by the number of n. It does not matter if it is 1 or 2 variables as long as it is a constant number. Why are you saying it is more than O(1) space? If you are calculating the sum of n numbers by accumulating it in a temporary variable, you would be using exactly 1 variable anyway.

Commenting in an answer because the system does not allow me to write comments yet.

Update (in reply to comments): in this answer i meant O(1) space wherever "space" or "time" was omitted. The quoted text is a part of an earlier answer to which this is a reply to.

Vines answered 7/10, 2008 at 3:19 Comment(4)

Big-O notation doesn't describe the space required to store information. Summing an array is an O(n) algorithm because you have to iterate over all the elements once each, therefore the amount of time required to do that calculation increases linearly with the number of elements.. – Seaddon 7/10, 2008 at 4:58

Big-O notation can be used for both time and space complexity. True, when not specified it means time, but both are still valid. – Weller 7/10, 2008 at 5:13

If we add two n-digit numbers we can get (n+1)-digit sum. This additional digit requires additional space. The more numbers we add the more space could be required. – Chara 10/10, 2008 at 4:5

In most programming languages a number would take fixed amount of bytes regardless of the number of digits. Even if it were to increase (e.g., as a result of switching from int to longit), it would increase so rarely that it would not matter. – Vines 16/10, 2008 at 3:25

note: this comment is based on the original text of the question (it has been corrected since)

If the question is posed exactly as written above (and it is not just a typo) and for array of size n the function should return (True/False) if the array consists of the numbers 1...n+1,

... then the answer will always be false because the array with all the numbers 1...n+1 will be of size n+1 and not n. hence the question can be answered in O(1). :)

Vines answered 7/10, 2008 at 3:19 Comment(1)

I've made more visible that the answer is irrelevant for the current vesion of the question. – Chara 8/10, 2008 at 15:18

MY CURRENT BEST OPTION

def uniqueSet( array )
  check_index = 0; 
  check_value = 0; 
  min = array[0];
  array.each_with_index{ |value,index|
         check_index = check_index ^ ( 1 << index );
         check_value = check_value ^ ( 1 << value );
         min = value if value < min
  } 
  check_index =  check_index  << min;
  return check_index == check_value; 
end

O(n) and Space O(1)

I wrote a script to brute force combinations that could fail that and it didn't find any. If you have an array which contravenes this function do tell. :)

@J.F. Sebastian

Its not a true hashing algorithm. Technically, its a highly efficient packed boolean array of "seen" values.

ci = 0, cv = 0
[5,4,3]{ 
  i = 0 
  v = 5 
  1 << 0 == 000001
  1 << 5 == 100000
  0 ^ 000001  = 000001
  0 ^ 100000  = 100000

  i = 1
  v = 4 
  1 << 1 == 000010
  1 << 4 == 010000
  000001 ^ 000010  = 000011
  100000 ^ 010000  = 110000 

  i = 2
  v = 3 
  1 << 2 == 000100
  1 << 3 == 001000
  000011 ^ 000100  = 000111
  110000 ^ 001000  = 111000 
}
min = 3 
000111 << 3 == 111000
111000 === 111000

The point of this being mostly that in order to "fake" most the problem cases one uses duplicates to do so. In this system, XOR penalises you for using the same value twice and assumes you instead did it 0 times.

The caveats here being of course:

both input array length and maximum array value is limited by the maximum value for $x in ( 1 << $x > 0 )
ultimate effectiveness depends on how your underlying system implements the abilities to:
1. shift 1 bit n places right.
2. xor 2 registers. ( where 'registers' may, depending on implementation, span several registers )
edit Noted, above statements seem confusing. Assuming a perfect machine, where an "integer" is a register with Infinite precision, which can still perform a ^ b in O(1) time.

But failing these assumptions, one has to start asking the algorithmic complexity of simple math.

How complex is 1 == 1 ?, surely that should be O(1) every time right?.
What about 2^32 == 2^32 .
O(1)? 2^33 == 2^33? Now you've got a question of register size and the underlying implementation.
Fortunately XOR and == can be done in parallel, so if one assumes infinite precision and a machine designed to cope with infinite precision, it is safe to assume XOR and == take constant time regardless of their value ( because its infinite width, it will have infinite 0 padding. Obviously this doesn't exist. But also, changing 000000 to 000100 is not increasing memory usage.
Yet on some machines , ( 1 << 32 ) << 1 will consume more memory, but how much is uncertain.

Brutal answered 7/10, 2008 at 3:19 Comment(6)

As I understand it, you solution calculates some kind of 32-bit hash. But as we know hash functions are prone to collisions e.g., md5sum. Can you prove that there are no collisions possible under the constraints of the question? – Chara 8/10, 2008 at 15:25

Your solution requires additional m bits, therefore it is not true O(1) in space. [m is the size (number of elements) of an array] – Chara 9/10, 2008 at 17:41

I've posted C version #177618 – Chara 9/10, 2008 at 19:45

I've noticed that my 1st and 2nd comments seems like contradicting each other. To clarify: 1st comment refers to implementation based on finite numbers (like int in C). 2nd comments refers to integer with infinite precision (like integers in Ruby). BTW, as it implemented above it takes O(m+n) bits. – Chara 10/10, 2008 at 0:21

I've constructed a counter-example. See above link to C version. – Chara 10/10, 2008 at 14:14

About complexity: Operations you described will be O(1) for fixed-width representations e.g., 1 << 100 in C, but it is not O(1) for calculation with infinite precision e.g. 1 << 100 in Ruby. – Chara 10/10, 2008 at 14:18

def test(a, n, m):
    seen = [False] * m
    for x in a:
        if x < n or x >= n+m:
            return False
        if seen[x-n]:
            return False
        seen[x-n] = True
    return False not in seen

print test([2, 3, 1], 1, 3)
print test([1, 3, 1], 1, 3)
print test([1, 2, 4], 1, 3)

Note that this only makes one pass through the first array, not considering the linear search involved in not in. :)

I also could have used a python set, but I opted for the straightforward solution where the performance characteristics of set need not be considered.

Update: Smashery pointed out that I had misparsed "constant amount of memory" and this solution doesn't actually solve the problem.

Bolshevist answered 7/10, 2008 at 3:19 Comment(1)

But that's O(n) storage - the question asks for O(1) storage. – Ietta 7/10, 2008 at 3:29

-1

I suppose the question boils down to ensuring that

(maximum - minimum + 1) == array_size

and this can obviously be done in O(N) time and O(1) space as follows :

int check_range(int input[], int N){
    int max = -INFINITY, min = INFINITY, i;
    for(i=0; i<N; i++){
        if(input[i] < min) min=input[i];
        if(input[i] > max) max=input[i];
    }
    return (max - min + 1) == N;
}

Note that this approach takes care of possibility of duplicates. Please report any discrepancy in the solution.

Advisory answered 7/10, 2008 at 3:19 Comment(0)

-1

Linear in time, constant in space solution for `int m`

Constant space is achieved by exploiting a sign bit. It can be done for any mutable int range if m is less than INT_MAX i.e., when the input range [n, n+m) can be shifted to [1, m+1) range if n is not positive. In practice, the precondition is almost always true if input is mutable.

/** gcc -std=c99 ... */
#include <assert.h>
#include <iso646.h>  // and, or
#include <limits.h>  // INT_MIN
#include <stdbool.h> // bool
#include <stdlib.h>  // abs()

bool inrange(int m; int a[m], int m, int n)
{
  /** Whether min(a[]) == n and max(a[]) == n+(m-1)
  */
  if (m == 0) return true; // empty range is a valid range

  // check out-of-range values
  int max = INT_MIN, min = INT_MAX;
  for (int i = 0; i < m; ++i) {
    if (min > a[i]) min = a[i];
    if (max < a[i]) max = a[i];
  }
  return (min == n and max == n+(m-1));
}

bool isperm_minus2(int m; int a[m], int m, int n)
{
  /** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')

      Whether a[] is a permutation of the range [n, n+m).

      feature: It marks visited items using a sign bit.
  */
  if (not inrange(a, m, n))
    return false; // out of range

  assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
  for (int *p = a; p != &a[m]; ++p) {
    *p -= (n - 1); // [n, n+m) -> [1, m+1)
    assert(*p > 0);
  }

  // determine: are there duplicates
  bool has_duplicates = false;
  for (int i = 0; i < m; ++i) {
    const int j = abs(a[i]) - 1;
    assert(j >= 0);
    assert(j < m);
    if (a[j] > 0)
      a[j] *= -1; // mark
    else { // already seen
      has_duplicates = true;
      break;
    }
  }

  // restore the array
  for (int *p = a; p != &a[m]; ++p) {
    if (*p < 0) 
      *p *= -1; // unmark
    // [1, m+1) -> [n, n+m)
    *p += (n - 1);        
  }

  return not has_duplicates; // no duplicates? (+ inrange)
}

Giles answered 7/10, 2008 at 3:19 Comment(0)

-1

How about using XOR with even and odd numbers separately. Think about bit level not integer value itself.

bool is_same(const int* a, const int* b, int len)
{
    int even_xor = 0; 
    int odd_xor = 0;

    for(int i=0;i<len;++i)
    {
        if(a[i] & 0x01) odd_xor ^= a[i];
        else even_xor ^= a[i];

        if(b[i] & 0x01) odd_xor ^= b[i];
        else even_xor ^= b[i];
    }

    return (even_xor == 0) && (odd_xor == 0);
}

Deraign answered 7/10, 2008 at 3:19 Comment(3)

Counter example: {0, 2, 7, 5,}. See my answer. – Chara 9/10, 2008 at 0:47

Another counter example, where all values are in range: {0, 0, 1, 3, 5, 6, 6} – Chara 9/10, 2008 at 13:39

I've posted adapted (for the OP question) version #177618 – Chara 9/10, 2008 at 19:47

-1

I don't think I explained myself in my original post well (below the solid line). For an input of [1 2 3 4 5], for example, the algorithm computes the sum:

-1 + 2 - 3 + 4 - 5

which should be equal to

-1^5 * ceil(5/2)

The pseudo-code below shows how vectors that do not begin at 1 are checked. The algorithm handles cases where the input vector is not sorted and/or it contains duplicates.

The following algorithm solves the problem by calculating the alternating sums of the vector elements:

-1 + 2 - 3 + 4 - 5 + .... + m = (-1)^m * ceil(m/2)

where ceil rounds up to the closest integer. In other words, odd numbers are subtracted from the running total and even numbers are added to it.

function test(data, m)
    altSum = 0
    n = Inf
    mCheck = -Inf
    for ii = 1:m
    {
        if data(ii) < n
            n = data(ii)
        if data(ii) > mCheck
            mCheck = data(ii)
        altSum = altSum + (-1)^data(ii) * data(ii)
    }
    if ((mCheck-n+1!=m) || (-1)^(n+m-1) * ceil((n+m-1)/2) - ((-1)^(n-1) * ceil((n-1)/2)) != altSum
        return false
    else
        return true

Anderson answered 7/10, 2008 at 3:19 Comment(12)

Counter example: {1, 1, 1, 2, 2} – Chara 9/10, 2008 at 16:8

Thanks for the counter example. I had implemented the algorithm incorrectly. It's now fixed. – Anderson 9/10, 2008 at 18:11

ceil((n+m+1/2)) is it: (n+m+1)//2 + ((n+m+1) % 2), where // integer division and % is a modulo division? – Chara 9/10, 2008 at 19:4

This version doesn't pass {1,2,3, 4, 5, 6, 7}. I will post version in C to avoid misinterpretation of pseudo-code. – Chara 9/10, 2008 at 19:9

Posted #177618 – Chara 9/10, 2008 at 19:41

Ahhh, this is what happens when I don't have my morning coffee. It should have read "n+m-1" not "n+m+1". Your interpretation of the ceil function is correct. – Anderson 9/10, 2008 at 19:42

Counter example: {1, 1, 2, 2, } – Chara 9/10, 2008 at 20:51

{1,1,2,2} is not a counter example. The algorithm produces the expected response for this input. – Anderson 9/10, 2008 at 21:39

Updated algorithm produce valid result for {1,1,2,2} but fails on {0, 1, 2}. New counter example {0,1,2}. BTW, it might be impossible (from the mathematical point of view) to detect duplicates in O(N) time and O(1) in space. – Chara 9/10, 2008 at 23:14

I've updated the C version to coincide with the pseudo-code. #177618 – Chara 9/10, 2008 at 23:21

As you've noticed, this algorithm is only valid for inputs >= 1. I'm thinking about how to deal with < 1 without knowing n beforehand. The trick to staying in O(1) space is to deal with differences between numbers instead of sums of numbers... – Anderson 10/10, 2008 at 0:6

I've updated C version to deal with {0, ...}. Now it fails on {1, 1, 2, 4, 6, 7, 7}, btw popopome's XOR version breaks on it too. – Chara 10/10, 2008 at 0:35

-1

Fail := False;
Sum1 := 0;
Sum2 := 0;
TSum1 := 0;
TSum2 := 0;

For i := 1 to m do
  Begin
    TSum1 := TSum1 + i;
    TSum2 := TSum2 + i * i;
    Item := Array[i] - n;
    If (Item < 0) or (Item >= m) then 
      Fail := True
    Else 
      Begin
        Sum1 := Sum1 + Item;
        Sum2 := Sum2 + Item * Item;
      End;
  End;
Fail := Fail Or (Sum1 <> TSum1) or (Sum2 <> TSum2);

Tired and no compiler but I think this gives O(m) runtime and can't be fooled.

Stroke answered 7/10, 2008 at 3:19 Comment(2)

This is quite elegant - but is it mathematically provable? The implication is that for a given m, there is only one series of length m that will result in a given combination of sum and sum-of-squares of that series. I googled, but could find no proof or corollary reference... – Pinnatipartite 7/10, 2008 at 5:42

See my counterexamples above. – Bolshevist 7/10, 2008 at 6:43

-1

If that was a typo and the question is about all numbers being in range 1...n instead, then:

def try_arr(arr):
    n = len(arr)
    return (not any(x<1 or x>n for x in arr)) and sum(arr)==n*(n+1)/2

$ print try_arr([1,2,3])
True

$ print try_arr([1,3,1])
False

$ print try_arr([1,2,4])
False

Notes:

I am using the definition from the original version that numbers start from 1. Sure code can be modified to account for starting from another number.
If the size of the array (n) was known, you could modify this to stream data from e.g., input file, and use almost no memory (1 temp variable inside sum() and 1 variable for the current item taken from the stream)
any() is new in python 2.5 (but you have alternative ways to express the same thing in earlier versions of python)
it uses O(n) time O(1) space. (update: i wrote it does account for duplicates, but apparently that is not true as demonstrated by a comment to another answer here).

Vines answered 7/10, 2008 at 3:19 Comment(0)

-1

It seems like we could check for duplicates by multiplying all the numbers n...n+m together, and then comparing that value to the expected product of a sequence with no duplicates m!/(n-1)! (note that this assumes it is impossible for a sequence to pass both the expected sum test and the expected product test).

So adding to hazzen's pseudo-code, we have:

is_range(int[] nums, int n, int m) {
  sum_to_m := (m * (m + 1)) / 2
  expected_sum := sum_to_m - (n * (n - 1)) / 2
  real_sum := sum(nums)
  expected_product := m! / (n - 1)!
  real_product := product(nums)
  return ((real_sum == expected_sum) && (expected_product == real_product))

EDIT: Here's my solution in Java using the Sum of Squares to check for duplicates. It also handles any range (including negative numbers) by shifting the sequence to start at 1.

// low must be less than high
public boolean isSequence(int[] nums, int low, int high) {
    int shift = 1 - low;
    low += shift;
    high += shift;

    int sum = 0;
    int sumSquares = 0;
    for (int i = 0; i < nums.length; i++) {
        int num = nums[i] + shift;

        if (num < low)
            return false;
        else if (num > high)
            return false;

        sum += num;
        sumSquares += num * num;
    }

    int expectedSum = (high * (high + 1)) / 2;

    if (sum != expectedSum)
        return false;

    int expectedSumSquares = high * (high + 1) * (2 * high + 1) / 6;

    if (sumSquares != expectedSumSquares)
        return false;

    return true;
}

Football answered 7/10, 2008 at 3:19 Comment(7)

[2,2,4,6,6] => ? ( assuming you know m and not n ) – Brutal 7/10, 2008 at 4:24

You could do an initial pass to determine the n and subtract every element by that. The question is whether or not a sequence can be constructed to pass both the sum and product tests, which I don't know. But this is the most promising lead we have. – Ajani 7/10, 2008 at 4:31

The expected product does not scale very well unless you allow infinite precision numbers; if you do, you are essentially using more than O(N) space to represent those numbers. That does give me the idea of trying something like summing n^2 to m^2 (which has a known formula). Not sure of props on it – Donley 7/10, 2008 at 4:32

You're right, hazzen, this doesn't scale very well. Maybe instead of factorial, if we use the sum of squares of the first n natural numbers n(n + 1)(2n + 1)/6, that would work better. – Football 7/10, 2008 at 4:45

Oh well, so much for using the sum of squares. :p I see a number of counterexamples have been provided above. Does anyone have any other mathematical tricks we haven't tried yet? – Football 7/10, 2008 at 7:7

When one of the numbers is zero, your expected result would be zero. – Connolly 8/10, 2008 at 14:29

Counter-example for both the sum and product tests: {1, 2, 4, 4, 4, 5, 7, 9, 9} (sum=45, product=362880) – Chara 11/10, 2008 at 23:56

-2

I don't think you need to use sums at all. Just check the minimum and maximum and check for dupes. Checking for dupes is the harder part, since you don't know n in advance, so you cannot sort in one pass. To work around that, relax the condition on the (edit:destination) array. Instead of requiring it to be sorted, go for a cyclical shift of a sorted sequence, so that the array goes [k,k+1,..., n+m-2, n+m-1,n,n+1, ... ,k-2,k-1] for some k.

With the condition above, you can assume that a[0] is already in the right position, then the right position for an element d is (d-a[0]) mod m, assuming zero-based array indexing. For example with [4,?,?,?] you can expect [4,5,6,7] or [4,1,2,3] or [4,5,6,3] or [4,5,2,3].

Then just scan the array once, putting each element in its calculated position, updating the min and max and checking for clashes. If there are no clashes and max-min=m, then the condition is met, otherwise it is false.

Ballerina answered 7/10, 2008 at 3:19 Comment(5)

What about [4, 2, 1, 3]? It meets the OP requirements, but It is not a cyclical shift of a sorted sequence. – Chara 9/10, 2008 at 14:5

In addition your solution doesn't work for read-only sequences (it is not O(1) in space for such cases). – Chara 9/10, 2008 at 14:8

[4,2,1,3] will be 'sorted' to [4,1,2,3] by the algorithm. I don't need the original array to be sorted this way. For read-only sequences it is probably impossible to check for dupes in O(m) time and constant memory. – Schilling 9/10, 2008 at 14:20

What are the advantages of this version compared to in-place bucket sort e.g., #177618 – Chara 9/10, 2008 at 20:0

The bucketsort assumes you know the range of integers in advance. This version works in one pass without this assumption. I re-read the question, it doesn't actually specify whether the range is a part of the input data or not. – Schilling 10/10, 2008 at 9:38

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Note to everyone (stuff collected from comments)

New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):

Simpler approach (modifying hazzen's idea):

Hazzen's algorithm implementation in C

Here's a working solution in O(n)

Product of m consecutive numbers is divisible by m! [ m factorial ]

A C version of b3's pseudo-code

A C version of Kent Fredric's Ruby solution

Counter-example for XOR algorithm.

Linear in time, constant in space solution for int m

Recommended topics

Hot tags

Linear in time, constant in space solution for `int m`