DB Design id/name Debate

I have wanted to blog about a simple DB Design approach that can make or break facilitation of coding, readability, maintenance, and ultimately warehousing approaches.

It is the "id"/"name" debate. For instance you have a "Status" table. In this table you have a list of status names and their respective surrogate keys in the id field driven by a sequence or identity field (Oracle/Sql Server).

Best practice in this situation is to label your id field status_id or statusID (Oracle/SQL Server). Due to upper casing in oracle an underscore is used for readability and has developed into a standard for Oracle DB naming, SQL Server allows mixed case which parlays nicely into your object model where the C-based syntax world has standardized on camel case for class properties and doesn't require aliasing in SQL code.

This very issue while on a project with a Microsoft Centric Dev working for the first time with an Oracle DB was a huge bone of contention. He had to alias all his sql due to the underscores as this made it easier for us when copy/pasting sql return assignments. Made good sense and I felt for him being a Oracle newb. This is one thing SQL Server has over Oracle when it comes to readability.

But back to the topic... The reason behind prefixing the id field with table name is for readability when joining. Imagine if all tables had id/name fields. This would be confusing rather quickly now you'd have to prefix at the very least in the table where the foreign key was assigned and I've seen worse. A child (foreign key) table that has a label that doesn't necessarily relate to the parent. For instance stat or statpkey... Acckkk! If the parent only had statusID and in turn the child table also had statusID, the developer, sys arch., dba will readily understand the relationship.

When it comes to the name field it is my opinion that this field should be prefixed too, because you run into name being everywhere and pending on whether you had a good SQL coder or not you are going to have to alias that field every time you use it which is a pain when many IDEs out there generate sql code for you.

Lastly when it comes to the name field you have to think about whether or not data mining will be used in the future against this data. Hopefully in a data warehouse. When it comes time to building the warehouse from the transaction db solution, generating the SQL and DDL to do so is much more straightforward and will facilitate flattening of the tables as the table in the originating transaction db will be prevalent in both the id and name fields. I look forward to your comments.

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Another common practice of mine is any time I might have more than one foreign key to the same table I name the columns after the foreign table's primary key and suffix it with the label. For instance: I have a user table whose primary key is userID. My order table would have userIDCustomer, userIDBillingContact, userIDShippingContact etc.
# Posted By Brad Wood | 3/1/09 2:22 AM
The issue for me with using aliases is when you use a lot of stored procedures. If the DBA isn't consistent with their alias naming through every sp then it can get confusing the for web app developer. Whatever convention you use it's critical to get it in written down to ensure everyone keeps their naming formats consistent through every table, sp, query, and app vars. BTW, I'm a camel case person - statusID, userID, etc for MS SQL projects and status_id for Oracle projects. Although if Oracle column names are like STATUSID then you can still camel case them once the query comes back to CF.
# Posted By Gary Fenton | 3/1/09 8:12 AM
I approach this from a class/OO perspective, so i have a major bias :-)

I am a reluctant SQL guy and try to keep it as simple as possible and stuffed into a hidden layer where we can relegate to the simplicity of persistence.

My distaste for having to use prefixes probably stems from spending my early days in C -- I learned to abhore the use of hungarian notation for variable names. That is, I abhor making the human the compiler checker :-(

I also hated the prefixing I would see in many code patterns where class attributes were given things like "m_blah" as a convention. As the reader, I had to strip off the prefix, as it was completely useless.

Prefixing the id makes the column standalone and unique. So yes, you can write sql without worrying about table aliasing the column name -- but only sometimes. Many times, you still have to alias the prefixed ID column name.

You can contrast the different ways and make your own trade-offs...

Explicit IDs/NAMEs:
select q.question_name, t.tag_name from p2_question_tags qt, p2_question q, p2_section s, p2_tag t
where s.questionnaire_id=3 and q.section_id=s.section_id and q.question_id=qt.question_id and qt.tag_id=t.tag_id

Or you can make "id" simply "id", name the foreign keys to represent the table name:
select q.name, t.name from p2_question_tags qt, p2_question q, p2_section s, p2_tag t
where s.questionnaire=3 and q.section=s.id and q.id=qt.question and qt.tag=t.id
# Posted By jon | 3/1/09 10:35 PM
Re: Brad Woods techniques:

"...My order table would have userIDCustomer, userIDBillingContact, userIDShippingContact"

I would prefer
Customer
BillingContact
ShippingContact
But if you wanted it to be more obvious (without having to look at the FK relationship diagram), you could make it:
CustomerUser
BillingContactUser
ShippingContactUser

Having a prefix like "userID" is too much like Hungarian notation for my tastes.

But then, this is all just a matter of opinion and aesthetics. Whatever the team chooses is the right answer -- for them.
# Posted By jon | 3/1/09 10:40 PM
@Brad

Thanks and with regard to your multi-foreign key approach. I understand your approach. My only concern is that if you look at the relationship regarding your Users, some are Customers, others are Billing Contacts, and lastly Shipping Contacts.

From an OO and normalization approach each has its own class extending a User. I'm guessing here that each may have properties or fields that don't exactly apply to a standard User meaning billing contact may have a SKU or something identifying them in the system and a shipping contact wouldn't necessarily have a SKU but multiple contact fields. What concerns me is if they are all lumped into the Users table we are going to have null fields for the differing attributes and this is a design faux pas when it comes to normal forming approaches.

So the Orders table would have a foreign key coming from the user type tables... Customers, BillingContacts, and ShippingContacts who all just happen to be Users of your system with a userID foreign key.

@Jon,

I believe my comment reiterates what you stated. FYI, that second query in your example would fail on the dual name fields unless they were aliased.

Thanks all for your insight.
# Posted By Strikefish | 3/2/09 8:46 AM
@Jeff: You are correct that my example leaves some to be desired. I had just thrown out two random tables I could think of that might relate to each other for the example.

I would probably implement my users and order differently, but I do think I would still differ from you a bit on the resolution though. First of all, I probably should have been more generic with my example and simply said "person" instead of "user".
To relate multiple people to an order I would probably use an orderPersonXref table of sorts containing orderID, PersonID, type, and itemSort columns. That way you can have a primary shipping contact as well as a secondary shipping contact who also doubles as the billing contact. And you aren't storing anything multiple times in your database since that example would represent three contacts with only two person records, and three xrefs.

There are problems that come into play while trying to keep a normalized data structure that also resembles your business objects. We are able to aggregate properties among a series of extended objects, while tables can't "extend" each other. I am still a firm believer of storing all instances of the same thing in the same place in my database. Therefore, shipping, billing, and customer contacts are all people with a first, middle, and last name, gender, salutation etc so I am going to store them in the same table. In addition to centralizing where I store stuff, it allows the same record to have several roles. All communications go in a communication table, and all addresses go in an address table. Users can have as many of each as they want via personCommoXref and personAddressXref. As far as properties only applying to one type of person-- I don't think it is that common. I'm not sure what you mean by SKU, but I usually allow for a person's record to contain an external ID used to identify themselves in their own system. And as far as the multiple contact fields, I believe that would have been handled by the commo and address xrefs.

If I did have properties that really only seemed to apply to one type, I would probably create special table designed to have a 1 to 1 relations with my person table that would only exist for persons of the type needing those extra fields. I would still link to the person id, not the id of the special table. I realize that does seem a little backwards, but if I assume that most of my people types won't require any extra information, then on average, I will be able to add a new type of person to my system (foo contact) by only adding an item into a utility group somewhere.

If my orderPersonXref simply has a type column it seems that is less work than a separate table per type.
# Posted By Brad Wood | 3/2/09 8:36 PM
@Brad

Great approaches here and I like the orderPersonXref with PersonType validation table approach most if we are truly talking apples to apples with Shipping, Billing Contact, etc. Thanks for taking the time to reply and clarify; greatly appreciated.
# Posted By Strikefish | 3/2/09 9:47 PM

Copyright Strikefish, Inc., 2005. All rights reserved.