Off-topic: Shapefile question

Topics: General Topics
Sep 25, 2007 at 8:26 PM
We don't normally work outside of North America, but now someone from Europe wants a shapefile that we created. Is the decimal point of a numeric value supposed to stay as a ".", without regard to regional settings?

Sep 25, 2007 at 9:29 PM
I'm presuming that you are talking about the attribute data. If you are, and you are not dealing with type "double" then, yes, it is always a "." character.
Sep 25, 2007 at 11:35 PM
Yes, you presume correctly. Thank you.
Sep 26, 2007 at 3:19 AM
Numeric values doesn't have the concept of a decimal point character.
It's not until you convert it into a string representation (or trying to parse a string into a numeric) that you should be careful.
Bottomline, if your attribute column is numeric, you have nothing to worry about, and if it's string (which is kinda stupid but it happens...) I would still not worry about it. It's up to the app that parses the strings to take care of these kinda things.

I think most Europeans are used to this by now, so I wouldn't worry too much. Why did we even come up with this mess anyway? :-)
Sep 26, 2007 at 4:12 AM
Odegaard's right - you shouldn't need to worry about it if you are using SharpMap - only if you are doing the writing yourself using some kind of stream. Numbers will go back and forth from SharpMap to and from the dbf (through the ShapeFileProvider) correctly and strings will automatically get handled by the CurrentCulture CultureInfo class.

@Odegaard - what mess? You mean to say that the period isn't the way God intended to separate whole numbers from fractions? ;)
Sep 26, 2007 at 7:04 AM
Laughing. First, Odegaard, the data in the .dbf part of a shapefile is text, even for "numeric" columns. Codekaizaen correctly understood that I was asking about converting the binary representation of a double to text for insertion into the .dbf file :) What a wonderful format, huh?

And, yes... I was pretty sure that the SharpMap provider would handle it correctly. But we haven't ported to SharpMap yet, and all other shapefile providers that we've tried are way to slow when writing 10's of thousands of records, like we have to write. So I'm using old C code that I wrote back in the very earliest days of the shapefile "specification" :( I sooooo look forward to porting completely to .Net. with a solution where we can replace the slow parts with our own code, but not have to replace the entire solution!

Codekiazn... your statement that the shape provider would use the current culture to write data in the correct text format is not right, is it? My problem is that I'm using regional settings in my legacy code, and it was converting the periods to commas on a russian machine. The shapefile provider (as an example) would really want to use the invariant culture explicitly in order to avoid using the comma as a decimal separator, right? That's what we have had to do with our .Net code that reads and writes numeric data as text on European machines when we want the format to always use the period as the decimal separator.

As I said... the question was off topick for SharpMap... but I knew this forum would respond quickly.


Sep 26, 2007 at 7:24 AM
Magnum... duh! You're right - kinda. DBF have several kind of numeric values. One is a normal 8-byte double, but I think it's only in newer versions of DBF. There's is also a number type that you correctly state uses an ASCII string, Nevertheless, as I recall the spec specifies this to always be '.', and sharpmap hopefully uses invariant culture to parse the number-type fields.

Codekaizen: No I'm bitching about us Europeans who came up with this lame idea to switch commas and decimal points (I mean... it is called decimal point for a reason isn't it?). Way too often have I had to change the regional setting of my system til english to avoid system crashes because some ignorant american forgot about us europeans and our stupid ways ;-)
Sep 26, 2007 at 7:55 AM
Actually, I think Odegaard understood it better than I led you to believe he did... if you just give the ShapeFileProvider the number (System.Double, it sounds like), it will write it out correctly into the dbf as a number string (DBF type 'N'), without you needing to specify any formatting: you just call ShapeFileProvider.Insert or .Update on the FeatureDataRow which contains a column of type System.Double. It will also write it out as a double (type 'O') if that's what the column type is. If you're creating the shapefile from scratch however, the provider currently uses type 'O' - which makes value and type fidelity higher, but I'm wary of tool compatibility. Let me know if this decision will be an issue for you, since I basically made it to be more compatible with the .Net type system.

As for speed, I'm somewhat (over)eager to implement a bit of an abstraction under the shapefile provider, so I can use a memory-mapped file stream on Win32. This should speed things up even more, even though .Net is pretty fast already. I'd like to get around to doing some speed tests if only to confirm things. I should probably just get Beta 1 done, though. ;)

In regards the number formats, it is correct that the invariant culture would be used for number formatting in numeric columns, but this would not be so for text columns. Odegaard was hypothesizing that you could format a number on your own, say by calling ToString() and Parse() or TryParse(), and store the result in a column of type 'text'. When you do this, number formatting would use the current culture, as per the MSDN docs (example from Double.ToString()):
  • This version of the ToString method implicitly uses the general numeric format specifier ("G") and the NumberFormatInfo for the current culture.