Recursion in QuadTree

Topics: Algorithms, Data Access, SharpMap Project, SharpMap v0.9 / v1.x, SharpMap v2.0
Sep 30, 2011 at 11:32 AM

Hi, I have a problem with a huge shape file (1.01 GB). This file has no ".sidx" associated, so sharpmap needs to load the spatial indexes with the QuadTree function. After a while I get the OutOfMemory Exception, as obvious, because the file reaches the child-depth of 22. I'm studying a way to transform the recursive function into a linear function... Any suggestion?

Coordinator
Oct 1, 2011 at 9:24 PM

You may want to have a look at the tree MapServer is using (it is derived from ShapeLib):

http://svn.osgeo.org/mapserver/trunk/mapserver/maptree.c

The license is pretty permissive, too, so you should have no problems porting it.

Hth FObermaier

Oct 3, 2011 at 9:29 AM

Thank you for the answer. I tried to change the CreateSpatialIndex function like this:

Heuristic heur;           

heur.maxdepth = 1; //(int) Math.Ceiling(Math.Log(GetFeatureCount(), 2));

 heur.minerror = 10;

heur.tartricnt = 5;

heur.mintricnt = 2;

return new QuadTree(objList, 0, heur);

 

...I tried this, before get mad with the algorythm conversion... With forcing maxdepth to 1, the entire map takes like 30 sec to load, and like 2-3 seconds to draw the desired region. I can view the map correctly, altough the maxdepth was 22. Why this? I mean... Do I really need to load the entire node structure into _child0 and _child1 to view correctly my shapefile?

Coordinator
Oct 4, 2011 at 10:35 AM

You might get away with clearing the objList right before creating the child nodes in the QuadTree constructor.

I've created a linear version of the QuadTree creation. It still needs some testing, but seems to be faster than the current version

Hth FObermaier

Oct 5, 2011 at 4:02 PM

Great! keep me up to date :) 

anyway, in the previous post I tried to set the QuadTree function just to load the first 2 parent nodes, and then exit. After the .ZoomToExtents() and .ToMap() function, my map was perfectly visualized. Why this?

Coordinator
Oct 6, 2011 at 8:51 AM

I don't know how many features there are in your ShapeFile but a computed maximum treedepth of 22 suggests there are a lot.

Now, the constructor of a quadtree node takes a collection of BoxObjects, which -if the criteria is met- is split up into two collections of BoxObjects. The inital collection is not cleared. This leads to a state during the construction of the QuadTree, where the parent node has a complete collection of the childrens collections. If you set the maximum depth to 1, you only end up having three collections, a complete one in the constructor of the root node, and two collections for the child nodes, each containing approximately half of the items.

I hope that makes sense

FObermaier

Oct 6, 2011 at 9:25 AM

My file has something like 3,000,000 features in it. It's a complete cadastral mapping of a state.

There's two things I don't understand:

1) My coords are already stored in the parent node, why is needed to have a complete collection of the children collections?

2) I noted the recursive algorythm fills for first the deepest children nodes. So these nodes have the entire file in them. Isn't it? Now, the second-last children nodes will be filled with the coordinates again, and have the entire file, again, in them... So the n-last children.

Coordinator
Oct 6, 2011 at 9:39 AM

It's a bug. Prior to creating the child nodes, the original collection should be cleared.

Oct 7, 2011 at 2:28 PM

So, do you know a way to fix this?

Coordinator
Oct 7, 2011 at 3:22 PM

I commited a patch regarding the issue.

Please try both

ShapeFile.SpatialIndexCreationOption = SpatialIndexCreation.Linear

and

ShapeFile.SpatialIndexCreationOption = SpatialIndexCreation.Recursive

Hth FObermaier

Oct 7, 2011 at 3:28 PM

Thank you very much, I let you know as soon as I can

Oct 10, 2011 at 9:16 AM

I have a problem with compiling the new release (93334)

 

row 814 col 6 - Error 248: The name '_CoordinateSystem' does not exists in the current context - sharpmap-93334\Trunk\SharpMap\Data\Providers\ShapeFile.cs

this makes SharpMap.dll to do not compile, the other projects cannot find SharpMap.dll reference and so on

Coordinator
Oct 10, 2011 at 9:33 AM

Sorry about that, I just fixed it.

Please use the xxxDSProjection configuration if you can.

Hth FObermaier

Oct 10, 2011 at 10:03 AM

nm that I found a temporary workaround. I'm trying that big shapefile and I noted the output of the stopwatch

Linear creation of QuadTree took 65656ms

that's so much better.

 

Now the next step: the rendering

I noted this - in VectorLayer.cs

protected void RenderInternal(Graphics g, Map map, BoundingBox envelope)

bool alreadyOpen = DataSource.IsOpen;

// If not open yet, open it                if (!alreadyOpen) { DataSource.Open(); }

// Read data                geoms = DataSource.GetGeometriesInView(envelope); <-- small delay for reading the file

// If was not open, close it                if (!alreadyOpen) { DataSource.Close(); }

....

 

For each zoom or pan I do on my Map, the steps are: 

- check if file is open

- read all the file

- close the file

 

but if I close the file after I read all the data, I'm forced to reopen and re-read all the data each time I want to change my view box. Is it right?

Oct 10, 2011 at 10:30 AM

The first method I call is VectorLayer.Envelope, through Map.GetExtensions(). Altough, I think .Envelope is the first method called by anything, or you are not allowed to view your file in the box...

in public override BoundingBox Envelope

bool wasOpen = DataSource.IsOpen;                   

if (!wasOpen)                     <-- file is not open yet, this var can't be true

    DataSource.Open();                  

box = DataSource.GetExtents();   <--first reading of file

if (!wasOpen) //Restore state                       

    DataSource.Close();

 

I tested the rendering with a small change on this function and it's much more fast! I think you guys have done a real good job with this tool

Coordinator
Oct 10, 2011 at 11:56 AM
steelraiden wrote:

nm that I found a temporary workaround. I'm trying that big shapefile and I noted the output of the stopwatch

Linear creation of QuadTree took 65656ms

that's so much better.

The creation of the ShapeFile index should be done only once. Instantiate your ShapeFile provider with

new ShapeFile(filenames[i], true);

if you have the SpatialIndex created.

Of course, if you limit your maximum tree depth to 1, thus stuffing all your nodes in the first level, your index will be created faster, but if you zoom into your map, you will be way slower.
The term "faster" refers to Linear vs Recursive with the same maxdepth parameter.

 

steelraiden wrote:

 

Now the next step: the rendering

I noted this - in VectorLayer.cs

protected void RenderInternal(Graphics g, Map map, BoundingBox envelope)

bool alreadyOpen = DataSource.IsOpen;

// If not open yet, open it                if (!alreadyOpen) { DataSource.Open(); }

// Read data                geoms = DataSource.GetGeometriesInView(envelope); <-- small delay for reading the file

// If was not open, close it                if (!alreadyOpen) { DataSource.Close(); }

....

 

For each zoom or pan I do on my Map, the steps are: 

- check if file is open

- read all the file

- close the file

 

but if I close the file after I read all the data, I'm forced to reopen and re-read all the data each time I want to change my view box. Is it right?

Nothing prevents you from leaving your provider open throughout the session.

var sf = new ShapeFile(filenames[i], true);
sf.Open();

Overall, you are probably better off using the database based providers for such huge amount of data. If you need the data to be easily portable, SpatiaLite is the way to go.

Hth FObermaier

Oct 10, 2011 at 12:26 PM

Actually the zoom and pan in linear mode are very fast, until I have something like 50000 geometry objects in my map. If I had more, the elapsed time is obviously due to the many objects in the box.

The term "faster" refers to Linear vs Recursive with the same maxdepth parameter. I didn't change the maxdepth:

var h = new Heuristic           

{

                maxdepth = (int) Math.Ceiling(Math.Log(GetFeatureCount(), 2)),                // These are not used for this approach

                minerror = 10,

                tartricnt = 5,

                mintricnt = 2

};

Recursive algorythm takes forever (and I get a stack overflow), linear algorythm takes about 1 minute

Coordinator
Oct 10, 2011 at 2:27 PM

I'm a bit confused. Are you satisfied with the SpatialIndex creation speed or not? I also don't understand the point you are trying to make in the post from CET 11:30 AM.

Maybe you should use useCache option as well, or go along with ericma62 for a caching proxy (see http://sharpmap.codeplex.com/discussions/275292)

cheers FObermaier

Oct 10, 2011 at 3:42 PM

I am satisfied with the SpatialIndex creation in linear mode. Nevermind that post, because you answered me later, telling me to keep open the DataSource, so the spatialindex won't reloading