# fast removal of duplicate IppiPoints

## fast removal of duplicate IppiPoints

Hi, I want to remove the duplicate IppiPoints from an array as fast as possible.

`IppiPoint* P = new IppiPoint[N];`

The array size N is 1520 but approximately first 400 index of it is modified at each iteration. The total number of iterations is image width x image height.. Thus I need a fast solution.. Namely, any duplicate of point1 which satisfies

`(point1.x = point2.x) && (point1.y = point2.y)`

should be removed..
The trivial solution is to check for each point if it matches with the next one..but the running time complexity would then be O(N^2).. Any idea is welcome..

6 帖子 / 0 new

Hello,

Does anyone has a solution for this problem?

In general  the  question would be how to remove duplicates in an array, in a multi-threaded manner.

Thank you very much.

>>...The trivial solution is to check for each point if it matches with the next one..but the running time complexity would
>>then be O(N^2).. Any idea is welcome...

In the solution you're considering a sorting step ( by x and y ) needs to be added. Also, if you really want to learn how to do it properly please take a look at how Relational Database Management Systems ( DBMSs ) do a GROUP BY of a data set.

Note: Even if it is Not related to IPP please take a look how simple it could be with an SQL query:

SELECT count(*), ptX, ptY FROM SomeTable GROUP BY ptX, ptY HAVING count(*)=1 ORDER BY ptX, ptY ASC

and it will create a data set with All Duplicates removed Ordered by X and Y values in Ascending order.

Do you use IPP library on a Linux or on Windows platform?

>>...In general the question would be how to remove duplicates in an array...

If the array is simple, that is a set of un-ordered values ( not pairs of un-ordered values like x and y ) then an algorithm based on a 1st part of the Pigeonhole sorting algorithm will remove all duplicates. A 2nd part ( reconstruction ) of the Pigeonhole sorting algorithm doesn't need to be used and it needs to be replaced with a compression. However, that algorithm is very sensitive to a range of values and ideally the range needs to be from 0 to 65536. Another limitation is that the algorithm can not be used with floating-point data types.

So, take a look at the Pigeonhole sorting algorithm.

>>...The trivial solution is to check for each point if it matches with the next one..but the running time complexity would
>>then be O(N^2).. Any idea is welcome...

STL's / Boost's map or MFC's CMapXxx classes could be used to detect and delete all duplicates in a data set.

Here is an example with MFC's class CMapStringToString ( however, it is Not the fastest one because it is a string based ):
...
CMapStringToString mS2S;
...
if( mS2S.Lookup( csKey, csKeyValue ) == FALSE ) // Not in a Result Set
{
...
mS2S.SetAt( csKey, csKeyValue );
...
}
else // Duplicate detected / Already in a Result Set
{
...
}