Howto: Harvesting
A persistent storage mechanism has been implemented in GI-cat 7 enabling to maintain a local copy of the different remote resources, to speed up subsequent queries.
Harvesting can be configured using the
GI-conf tool and it can be performed either automatically, at regular time intervals, or manually.
As depicted in the image below, harvesting can be configured by means of the "Search strategy" pane in the "Resource editor" tool.
Choosing a search strategy
For each federated resource user can choose among two different kinds of
search strategy:
- Distributed search strategy
This is the default search strategy, represented by the icon

. With this strategy, the resource will be searched at the time of a user query.
Results for this kind of search can be obtained only if, at the time of the user query, the given resource is online and is correctly reachable through the Internet.
With this search strategy, represented by the icon

, the resource will be harvested according to the given interval (at least 30 minutes ). Once a resource has been harvested, its whole data content is persistently stored in a local database instance created by GI-cat.
Creation and managing of this database instance is automatic and totally transparent to the user.
Warning! The database is initially empty. Start manually an harvesting the first time you configure an harvester, in order to populate GI-cat with resources!
This kind of search is faster than the distributed one and, since data are locally stored, it can be performed also offline.
Harvesting will be automatically performed according to the specified time interval, expressed in days, hours and minutes. In order to avoid performance issues on the machine which hosts the GI-cat instance,
it is recommended to carefully select an appropriate time interval.
It is also possible to execute harvesting "on demand", by clicking the "Start harvesting now" button. As depicted in the image below, an information dialog is opened,
showing the elapsed time, the harvesting phase and a message.
During the "PROCESSING" phase, harvesting can be interrupted by clicking the button beside the progress bar.
Search strategy of each federated resource is also reported in the GI-cat home page.
How to move an harvested set of resources to another GI-cat installation
A quick guide to achieve the result is available following. Note that this operation may soon be subject to changes (and improvements).
1: Stop the service and look into the local db directory you chose from the web configurator ($HOME/.exist by default).
2: You will find a sub directory named with the unique id of your GI-cat (e.g. GI-cat-db-98b2a5b5-2ee8-44c0-a5f4-3b0a3c44b0ec)
Copy the content of this directory in another folder for later use (the source folder can also be removed, for clean up!).
3: Start up the new GI-cat instance, re-import the same configuration
4: harvest something (this is needed, to make GI-cat create the db directory the first time, e.g. XYZ).
5: Stop the service and replace the content of XYZ folder with the saved content.