r10 - 18 Jul 2011 - 16:53:26 - EnricoBoldriniYou are here: TWiki >  GIcat Web > Harvesting

Howto: Harvesting

A persistent storage mechanism has been implemented in GI-cat 7 enabling to maintain a local copy of the different remote resources, to speed up subsequent queries.
Harvesting can be configured using the GI-Conf.png GI-conf tool and it can be performed either automatically, at regular time intervals, or manually.

conf.png

As depicted in the image below, harvesting can be configured by means of the "Search strategy" pane in the "Resource editor" tool.

edit_resources_2.png

Choosing a search strategy

For each federated resource user can choose among two different kinds of search strategy:

  • Distributed search strategy

This is the default search strategy, represented by the icon distributed.png. With this strategy, the resource will be searched at the time of a user query.
Results for this kind of search can be obtained only if, at the time of the user query, the given resource is online and is correctly reachable through the Internet.

dist_search.png

  • Harvesting strategy

With this search strategy, represented by the icon harvesting_icon.png, the resource will be harvested according to the given interval (at least 30 minutes ). Once a resource has been harvested, its whole data content is persistently stored in a local database instance created by GI-cat.
Creation and managing of this database instance is automatic and totally transparent to the user.

Warning! The database is initially empty. Start manually an harvesting the first time you configure an harvester, in order to populate GI-cat with resources!

This kind of search is faster than the distributed one and, since data are locally stored, it can be performed also offline.

harvesting.png

Harvesting will be automatically performed according to the specified time interval, expressed in days, hours and minutes. In order to avoid performance issues on the machine which hosts the GI-cat instance,
it is recommended to carefully select an appropriate time interval.

It is also possible to execute harvesting "on demand", by clicking the "Start harvesting now" button. As depicted in the image below, an information dialog is opened,
showing the elapsed time, the harvesting phase and a message.
During the "PROCESSING" phase, harvesting can be interrupted by clicking the button beside the progress bar.

harvesting_2.png

Search strategy of each federated resource is also reported in the GI-cat home page.

gi-cat_home.png

How to move an harvested set of resources to another GI-cat installation

A quick guide to achieve the result is available following. Note that this operation may soon be subject to changes (and improvements).

1: Stop the service and look into the local db directory you chose from the web configurator ($HOME/.exist by default).

2: You will find a sub directory named with the unique id of your GI-cat (e.g. GI-cat-db-98b2a5b5-2ee8-44c0-a5f4-3b0a3c44b0ec)

Copy the content of this directory in another folder for later use (the source folder can also be removed, for clean up!).

3: Start up the new GI-cat instance, re-import the same configuration

4: harvest something (this is needed, to make GI-cat create the db directory the first time, e.g. XYZ).

5: Stop the service and replace the content of XYZ folder with the saved content.

Topic attachments
I Attachment Action Size Date Who Comment
pngpng conf.png manage 138.6 K 09 Apr 2010 - 10:12 FabrizioPapeschi  
pngpng dist_search.png manage 11.3 K 09 Apr 2010 - 12:02 FabrizioPapeschi  
pngpng distributed.png manage 1.3 K 09 Apr 2010 - 11:38 FabrizioPapeschi  
pngpng edit_resources_2.png manage 81.4 K 09 Apr 2010 - 12:18 FabrizioPapeschi  
pngpng gi-cat_home.png manage 82.6 K 09 Apr 2010 - 12:55 FabrizioPapeschi  
pngpng harvesting.png manage 13.5 K 09 Apr 2010 - 12:06 FabrizioPapeschi  
pngpng harvesting_2.png manage 35.5 K 09 Apr 2010 - 12:45 FabrizioPapeschi  
pngpng harvesting_icon.png manage 2.0 K 09 Apr 2010 - 12:21 FabrizioPapeschi  
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r10 < r9 < r8 < r7 < r6 | More topic actions
 
ESSI Lab
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback