pat2vec.pat2vec_search.search_multiprocess

Functions

`cohort_searcher_with_terms_and_search_multi`(...)	Searches a cohort in parallel using multiple processes.
`pull_and_write`(index_name, fields_list, ...)	Pull data from elasticsearch and write to a file.

pat2vec.pat2vec_search.search_multiprocess.pull_and_write(index_name, fields_list, term_name, entered_list, search_string)[source]

Pull data from elasticsearch and write to a file.

Parameters:

index_name (str) – The name of the index to search.
fields_list (list) – The list of fields to retrieve.
term_name (str) – The name of the term to search.
entered_list (list) – The list of values to search for.
search_string (str) – The search string to use.

Notes

The file is written in append mode, so if the file already exists, data will be appended to it. The header is not written to the file, so if you want a header, you need to add it manually.

pat2vec.pat2vec_search.search_multiprocess.cohort_searcher_with_terms_and_search_multi(index_name, fields_list, term_name, entered_list, search_string)[source]

Searches a cohort in parallel using multiple processes.

This function splits a large list of search terms (entered_list) into chunks and distributes the search queries across multiple processes. The results from each process are written to a temporary file and then read back into a single DataFrame.

Parameters:

index_name (str) – The name of the index to search.
fields_list (List[str]) – The list of fields to retrieve.
term_name (str) – The name of the term to filter on.
entered_list (List[str]) – The list of values to search for.
search_string (str) – The search string to use.

Return type:

DataFrame

Returns:

A DataFrame containing the combined results of the parallel search.