Determine an index of a closest number in a database, from a user's input

Question

Hello people!

I need to build a class to hold different float values. I will use it's instance to provide 3 integers closest to the argument a user supplied to it.

The first output number (center one) is considered to have a closest value to the input argument. The other 2 ones are the "2nd closest" to the input value. Like this: 2 <---- 1 ----> 2 . Then, I will return an object, which will store 3 variables as closest (1), first neighbor (2) and second neighbor(3).

Now, for example, the user supplied 2.2f .

The data held is as follows (was pre-filtered to be from lowest to greatest):

0.1f;

2.05f;

3.0f;

4.6f

4.65f;

How can I arrange my structure so that I can see that the number supplied is closest to 2.05f, return this 2.05f, plus 0.1f and 3.0f as its 2 neighbors?

The thing is, I don't want to iterate through the list or whatever it could be, until I will find the closest value. If possible, I would want to immediately figure out 2.05 is sitting on the 2nd space, 0.1 is on the first one and 3.0 is the third one. This somewhat reminds me of Dictionary when I think of it.

Any ideas on how to return a cell index which contains the closest number to the input one? How should I build my class to make it possible?

Thank you

Answer 1

Best Answer

Answer by semiessessi · Jan 12, 2014 at 08:15 PM

What you want here is almost certainly to binary search.

There are two common approaches which are essentially identical - use an ordered array or a (binary) tree data structure (an ordered array can be considered an encoding for a binary tree - which is itself one possible encoding of an nary-tree - and vice versa for both of those relations).

The ordered array approach is probably most suitable here, since once you know the index you can use -1 and +1 to get the neighbours.

I will not bore you with the detail of the binary search but instead refer you to Array.BinarySearch and Array.Sort which will enable you to implement this without knowing the implementation details yourself.

As an aside you may be reminded of Dictionary because of its fast lookups - a simple but completely generic way to implement a dictionary is with a binary tree where the key used is some kind of orderable hash of the dictionary key type (e.g. the GetHashCode function implemented on all .NET objects).

Add comment · Show 9 · Share

IgorAherne · Jan 12, 2014 at 08:22 PM 0

Share

Great, thank you :)

HappyMoo · Jan 12, 2014 at 09:53 PM 0

Share

Yes, BinarySearch is the way to go here...

But what you say about Dictionaries is wrong. No Dictionary is implemented with Binarysearch and they all have real O(1) access unless they're overfilled (but they should resize in that case) or the inserted objects have a faulty hashcode implementation.

semiessessi · Jan 13, 2014 at 12:17 AM 0

Share

can you point me at a counter example implementation? i'd be interested to see it. i'm quite happy to be wrong if it means i learn something. :)

the only production hash map implementations i've seen either defer to the C++ standard library or use a binary tree (in one case encoded in an ordered array). these are all in performance critical contexts (proper, down to the metal, console/PC games). i have not dug around much in the stl implementations - although i strongly suspect the latest $$anonymous$$S one will have its performance do$$anonymous$$ated by needless thread safety for small n.

the usual 'hash bucket listy thing' implementation is also usually terrible in practice despite the O( 1 ) complexity - a neat binary tree implementation will suffer a single cache miss per lookup, and depending on how you are doing lookups or if you need to walk the tree its quite easy to avoid subsequent cache misses on reasonably sized maps. even good implementations of lists of buckets tend to result in two cache misses per lookup, where only one of them is 'negotiable' in this way, and thats before we even consider the expense of the hash function implementation. assu$$anonymous$$g these cache misses do not exist in either case the binary tree 'hash function' is essentially O( log n ) integer comparisons and memory lookups - unless n is enormous this seriously constrains required performance characteristics of the hash function for a list of buckets implementation...

As a real world example of a standard library which isn't the .NET framework, I have measured the performance of Objective-C's NSDictionary against naive binary trees or the C++ std::unordered_map - there is really no serious competition there at all. A little investigation makes it easy to reveal why - the overhead of message passing in that environment dwarfs the performance characteristics of any implementation for reasonable n. I'd guess it might outperform them for impractically large n though where the do$$anonymous$$ant cost becomes the data structure operation itself...

HappyMoo · Jan 13, 2014 at 12:56 AM 0

Share

Every implementation... HashTables are called that way, because that's how they are implemented. They are not BinaryTrees of some kind.

http://en.wikipedia.org/wiki/Hash_table

BinaryTrees and Dictionaries have both their right to exist and I don't know what you tested and how you tested it, but if I know I need a Hashtable, I know the constraints and it will always be faster than a BinTree.

semiessessi · Jan 13, 2014 at 01:23 AM 0

Share

thought i'd look at someone else's real world implementation which is public - and it does use the normal 'hash table' implementation but geared towards performance for small n: https://github.com/id-Software/DOO$$anonymous$$-3-BFG/blob/master/neo/idlib/containers/HashTable.h

this has O( 1 ) lookup and writes for small n, but degrades for large n towards O( n ) complexity for either... i'd be curious to know how much by measuring but you can inspect this code for yourself and reach your own conclusions. it uses a tweakable parameter though so that there is some control over how fast it degrades vs. memory waste.

now there are optimisations possible to avoid this - so that tells me that it was not significant for the development of Doom 3 (BFG) and that if anything mattered here it was the small n performance which this implementation is best suited for.

regardless, i would be very confident in having to optimise this data structure, despite its origin from perhaps the most prestigious gamedev in existence - it looks largely like unoptimised code - there is nothing especially weird here except what i believe are the placement news using a macro i can't find the definition for (i'm guessing some pre-compiled header?): TAG_IDLIB_HASH. A placement new seems sensible here otherwise we can forget the rest - allocation would easily do$$anonymous$$ate the cost of using this data structure, almost certainly... i'd still want to measure it to say for sure. i'd be inclined to think they override new anyway because system allocators are generally not very well optimised for the kinds of allocations done in games which are usually very predictable and things like linear allocators, fixed size bucket allocators and free list pools are generally better suited...

edit: on further inspection i guess that TAG_IDLIB_HASH is an identifier for a memory pool and new has been overriden so that placement new is not placement new at all... (or takes an int parameter or something which this is) - also there is a further optimised 'HashIndex' data structure too... https://github.com/id-Software/DOO$$anonymous$$-3-BFG/blob/master/neo/idlib/containers/HashIndex.h

edit: my previous comment has disappeared. the key point from that is that i tested it by measuring ins$$anonymous$$d of assu$$anonymous$$g things or using knowledge. and that the two AAA gamedevs I previously worked for did not use hash table as the implementation for dictionary or hashmap - one used a tree and the other an ordered array (both can be seen as encodings of each other anyway)

Show more comments

Determine an index of a closest number in a database, from a user's input

1 Reply

Your answer

Follow this Question

Related Questions