The default location for the index data is the xapiandb subdirectory of the Recoll configuration directory, typically $HOME/.recoll/xapiandb/. This can be changed via two different methods (with different purposes):
You can specify a different configuration directory by setting the RECOLL_CONFDIR environment variable, or using the -c option to the Recoll commands. This method would typically be used to index different areas of the file system to different indexes. For example, if you were to issue the following commands:
export RECOLL_CONFDIR=~/.indexes-email
recoll
         
Then Recoll would use configuration files stored in ~/.indexes-email/ and, (unless specified otherwise in recoll.conf) would look for the index in ~/.indexes-email/xapiandb/. Using multiple configuration directories and configuration options allows you to tailor multiple configurations and indexes to handle whatever subset of the available data that you wish to make searchable.
You can also specify a different storage location for the index by setting the dbdir parameter in the configuration file (see the configuration section). This method would mainly be of use if you wanted to keep the configuration directory in its default location, but desired another location for the index, typically out of disk occupation concerns.
The size of the index is determined by the size of the set of documents, but the ratio can vary a lot. For a typical mixed set of documents, the index size will often be close to the data set size. In specific cases (a set of compressed mbox files for example), the index can become much bigger than the documents. It may also be much smaller if the documents contain a lot of images or other non-indexed data (an extreme example being a set of mp3 files where only the tags would be indexed).
Of course, images, sound and video do not increase the index size, which means that it will be quite typical nowadays (2006), that even a big index will be negligible against the total amount of data on the computer.
The index data directory (xapiandb) only contains data that can be completely rebuilt by an index run, and it can always be destroyed safely.
If your first installation of Recoll was 1.9.0 or more recent, you can skip this section.
Xapian has had two possible index formats for quite some time. The "old" one named Quartz, and the new one named Flint. Xapian 0.9 used Quartz by default, but could use Flint if a specific environment variable (XAPIAN_PREFER_FLINT) was set. Xapian 1.0 still supports Quartz but will use Flint by default for new index creations.
The number of disk accesses performed during indexing has been much optimized in the new Flint engine and you may see indexing times improved by 50% in some cases (compared to Quartz), typically for big indexes where disk accesses dominate the indexing time. There is also a more modest improvement of index size.
Xapian will not convert automatically an existing index from the Quartz to the Flint format. If you have an older index and want to take advantage of the new format (which can be done without setting the environment variable as of Recoll 1.8.2 and Xapian 1.0.0), you will have to explicitely delete the old index, then run a normal indexing process.
Unfortunately, using the -z option to recollindex is not sufficient to change the format, you have to delete all files inside the index directory (typically ~/.recoll/xapiandb) before starting indexing.
The Recoll index does not hold copies of the indexed documents. But it does hold enough data to allow for an almost complete reconstruction. If confidential data is indexed, access to the database directory should be restricted.
As of version 1.4, Recoll will create the configuration directory with a mode of 0700 (access by owner only). As the index data directory is by default a sub-directory of the configuration directory, this should result in appropriate protection.
If you use another setup, you should think of the kind of protection you need for your index, set the directory and files access modes appropriately, and also maybe adjust the umask used during index updates.