Hacker News new | past | comments | ask | show | jobs | submit login

> and use CHAR if you are storing strings of a fixed length, because semantics are a good thing

The CHAR type is not what most people think it is. It is a blank padded string, not a fixed length one. Inserting 'a' into a CHAR(2) results in 'a ' being stored and retrieved from the database. You should always used VARCHAR or TEXT in PostgreSQL and never CHAR (at least I cannot think of a case when you would want it). CHAR is there for SQL standard compliance. Instead use one of these:

  field VARCHAR(2) CHECK (length(field) = 2)
  field VARCHAR CHECK (length(field) = 2)
  field TEXT CHECK (length(field) = 2)
The rules for working with blank padded strings is in my personal experience a hell with non-obvious results when casting, sorting, comparing and other operations.

EDIT: I can leave you with this little example.

  $ SELECT length('ab  '::char(4) || 'cd  '::char(4));
   length 
  --------
        4
  (1 row)



We're storing currency codes, and they're always 3 chars (EUR, USD and so on) so it would just be stupid to actually use VARCHAR and not CHAR for that


Well not stupid per-se, since the result's the same. But it would provide less information to the reader.


I do not see how either VARCHAR or CHAR would provide more or less information since both are variable size strings. One is space padded and one is not.

If you care about semantics you should create a domain based on VARCHAR.


the fact that one sees "CHAR" in the schema definition is self-documenting that this is a fixed length field (and CHAR is "fixed length", the size of the string you get back will always be N), that's what I mean by "semantics". E.g., "what does it mean that the developer used CHAR here and not a VARCHAR?"

The CHECK constraint you illustrated earlier can just as well be placed on a CHAR (using trim() as well to adjust for padding). Then there's no chance of any blank padding issues either.


But the semantics of CHAR are not what most people expect and almost never what you actually want. If you want an actual fixed length of non-blank data you need an additional check constraint to enforce the min-length.

CHAR semantically represents fixed length text fields from old data file formats not this data always has n (non-blank) characters.

If you do have different length then a VARCHAR is more appropriate. Also a lot of application frameworks that interact with the database only deal with VARCHAR so then as soon as you use a CHAR you have to start worrying about trimming your text data because one of the biggest database text type gotchas is accidentally trying to compare a VARCHAR and a CHAR improperly.

While I can see good reasons to include length checks there is never a good reason to use a CHAR unless you're trying to interoperate with COBOL programs written in the 80's


> But the semantics of CHAR are not what most people expect

how do you know that? did you take a survey? I've been working with DBAs for years, in my world everyone knows that's how CHAR works. The padding behavior is nothing new, and is intuitive - the value must be N characters in length, so if you stored less, you get back a right-padded string. This is exactly what I'd expect.

> CHAR semantically represents fixed length text fields from old data file formats

and two or three letter character codes like country codes, state codes, etc. are what we use CHAR for, these are fixed length text fields. They are still in modern use today. Plus lots of us still have to write apps that actually read old files too - CHAR is appropriate for these as well, assuming you are storing fields that aren't right-padded in the source datafile (such as social security numbers, etc.).

Your app will of course work with a VARCHAR instead, but the point of CHAR is that it's self-documenting as to the type of data to be stored in the field - fixed length, as opposed to variable length.

> If you do have different length then a VARCHAR is more appropriate.

if you are storing variable length, then you should absolutely use VARCHAR. That's why it's called "VAR", it means, "variable".

> Also a lot of application frameworks that interact with the database only deal with VARCHAR so then as soon as you use a CHAR you have to start worrying about trimming your text data

If you are using CHAR correctly, you don't have to trim anything, because you are storing a string that is exactly the length of the CHAR type. I'm not familiar with how an application framework would only know how to deal with VARCHAR and not CHAR, database adapters return strings for both types. And if there were such a framework, I'd not be using it.

> one of the biggest database text type gotchas is accidentally trying to compare a VARCHAR and a CHAR improperly.

all of which stems from the same, singular mistake - don't store variable length data in a CHAR - plus if you are comparing VARCHAR to CHAR, that is also usually doing it wrong, as an adequately normalized database wouldn't be repurposing some kind of fixed length datatype out into a VARCHAR of some kind elsewhere.

The aforementioned CHECK constraint is a good way to enforce that if the developers/frameworks in question tend to be error-prone about this kind of thing (it's not an error I've had much issue with, since I know how CHAR behaves).

> While I can see good reasons to include length checks there is never a good reason to use a CHAR unless you're trying to interoperate with COBOL programs written in the 80's

as it turns out a vast portion of the world economy is supported by mainframes and old software that often spits out fixed length datafiles, there is even Python code in my current work project just written in the past six months which is tasked with parsing such files (they are actually quite simple to parse, since you just pull out each field based on an exact position). Not to mention that boring things like state codes, country codes, and the like are often fixed length fields.


I say it's not what people expect because everyone emphasizes the "fixed length" rather than "blank padded" nature of CHAR. CHAR is only actually a fixed length if you actually ensure that it is so yourself. That's possible but then you're just using a CHAR as a placeholder for those semantics not as something that naturally enforces them.

If you actually really really have fixed length fields then yes CHARs could be appropriate. But for many things even though you intend for them to be static length codes things change when you start having to interoperate with systems designed with different constraints. (For example after mergers or aquiring a competitor.) And I know that mainframes still exist but they aren't the use case in mind when many say "USE CHAR".

Also the database adapter that handles CHAR poorly is none other than JDBC on oracle http://stackoverflow.com/questions/5332845/oracle-jdbc-and-o... (Yes that is hilariously bad.) But the mysql way of always ignoring trailing whitespace is not standard in all databases.


  CREATE DOMAIN currency_code ...


...and you apparently didn't read the comment you were replying to.

(which, fwiw, was written by someone who likely knows more than is healthy about RDBMS implementations of SQL)


Yes, I did read it and what I disagreed about is CHAR being semantically correct. Is is not. The explanation was provided for the benefit of other readers than myself and the parent poster.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: