Indexing

Modified on Thu, 21 Mar at 8:07 PM

Indexing is the process of copying folio data from the primary data stores (the SQL database and file system for attachments) into the Elasticsearch index. 


The LawMaster Application Service runs a background job for indexing. Indexing is the process of copying folio data into the Elasticsearch index. You can see the status of the indexing job in Parameters > Records Management > Indexing. 


The indexing job fetches folios from oldest to newest (according to a dedicated timestamp value) and sends their content to Elasticsearch. If a folio has an attachment on a DMS with full text searching enabled, LawMaster sends a request to the corresponding DMS to extract the text content of the folio attachment. The DMS responds by using the Apache Tika service to extract text from the document and returns it to the indexing job for sending to Elasticsearch.


Folio timestamps are stored in the Matter_Folios_RowVer SQL table. In this table is the mforv_row_ver column. Whenever a folio (Matter_Folio table) or data related to its attachment (Folios_Storage table) is modified, the mforv_row_ver value is updated such that it will be the first row if sorted in descending order. When sending data to Elasticsearch, this timestamp is also stored in Elasticsearch as a way of identifying which folios are indexed (all those with a lesser timestamp value), and those that are not indexed (those with a greater timestamp value).


Eventual Consistency


Folio data is copied from the primary storage (the SQL database for data, and the file system for documents) to a secondary storage (Elasticsearch) using an eventual consistency pattern. This means that the data in Elasticsearch is not necessarily the same as in the primary storage, but given enough time, eventually will be. 


At an extreme case, there might be no data in Elasticsearch. This occurs when the LawMaster database is first brought online (for example, from migrated data). Routine folio maintenance (adds, modifies, and deletes) is never immediately represented in Elasticsearch, so there is a latency between the data maintenance, and that maintenance being reflected in the search results. 


This contrasts with a transactional consistency pattern, where a folio maintenance task is not acknowledged in the primary storage until it is also updated in the secondary storage. Such an approach could have a significant impact on data maintenance performance, especially in distributed systems. Hence, eventual consistency is chosen in favour of transactional consistency, as a way of prioritising primary data maintenance performance.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article