Character Index Keys
Feb 11, 2010
How big would the performance penalty be when using short ASCII character index key instead of integers?
The idea is to have individual records, and keys, a little more human readable, in case something goes wrong or needs to be touched up. Using, say, 9 ASCII characters, instead of INT may not mean loosing much in performance?
Note that these index keys would only occur as primary keys in very small (~100 records) tables that are virtually constant and as such probably sure candidates to be replicated across nodes.
RE: Character Index Keys
Feb 11, 2010
The first step is to identify where the dominant cost for your application is likely to be. For most CRUD stuff the stored procedure execution time is dominated by the overhead of invoking the stored procedure. If every stored procedure invocation looks up a single value from this table then there will be no difference. If each procedure does (pulling a number out of my hat) 50 lookups or if the index will be used in a join involving a largish number of tuples then it might show up. A tree index benefits from integer keys more than a hash index, which is the default index, because a tree index has to perform more key comparisons overall.
The advantage of the integer keys is that they can be packed together into a series of 8-byte ints and the comparison and hashing functions are templated on the number of ints involved so there is no branching or schema examination at runtime. The loops can be unrolled and it all compiles out to a series of integer comparisons. A string is punted to a generic key that is schema aware and has to do all sorts of conditional work to determine how to compare each column in the key by examining the schema.
Another question to ask is how much performance do you need? Does it matter If a single node does 95k transactions instead of 100k if 3 nodes are necessary to get the desired performance anyway? There is a lot you can do to make your app faster, but one of the goals of Volt is that you shouldn't have to. My advice is that you start with the string key and change it later if you want to get more per-node performance.
Hope this helps,
No AUTO INCREMENT
Apr 18, 2010
The whole discussion wins significance because there are no trivial incrementing ids to be had.
Although Tim provides an easy solution for that: http://forum.voltdb.com/showthread.php?355-Auto-Increment&highlight=auto+increment
I am still currently wondering if we might use only random numbers (or strings).
Comparing the performance loss of a) creating unique numbers as Tim supposes and b) of using 30 digit random hex numbers (performance loss by longer index keys) ... what is preferable?
Apr 18, 2010
Changing it Later will not happen for all I've seen. I know it may be premature optimization at work, in a way, but this kind of deep in the guts change is very unlikely to ever take place.
So if we start out with String IDs, we'll be stuck with them for a long time.
Re-reading your answer I was wondering if it may make sense to code Strings into Integers on our part. So we could have a key that is readable by a simple Integer to ASCII conversion. But VoltDB can use the keys as if the were LONGs, never knowing or caring about that they form a brief word if interpreted as ASCII?
May that make any sense?
I fully appreciate that "with VoltDB you should not have to think about the performance" in this way. And I think I am starting to see were this is very true. But for the complexity that you describe for string key look ups, it seems to be a good idea to use integers as keys. Am I overreacting?
Changing Later/No auto-increment
Apr 19, 2010
Changing it Later will not happen for all I've seen. I know it may be premature optimization at work, in a way, but this kind of deep in the guts change is very unlikely to ever take place.Hi Henning,
So if we start out with String IDs, we'll be stuck with them for a long time....
Is this key your attempting to generate going to be the partition key? If it is you can generate a random value. Check if it is already in use. If it isn't then great you have a unique value. If it isn't you can rollback and try again. I think that the long will be measurably faster in the long run. You only have to generate the value once, but it will be used in tree index comparisons for a long time. A 30 (31 with length prefix) byte value can straddle two cache lines. A string that short would be inlined into the key tuple so there is no memory indirection penalty.
If it is not the partition key then you have to wonder what will happen if partitions are split or merged and how that will effect uniqueness.