pat2vec.pat2vec_search.cogstack_search_methods๏
Functions
|
Searches an index using only a query string. |
|
Searches an index using different query string methods. |
Searches a cohort using a term filter and a query string. |
|
Searches a cohort using only a term-level filter. |
|
Creates a template credentials.py file. |
|
|
A generator that yields DataFrames from a list of DataFrames. |
|
Initializes the global CogStack client cs. |
Iteratively searches for EPR documents matching multiple search terms. |
|
|
Iteratively searches for MCT documents matching multiple search terms. |
|
Iteratively searches for textual observations matching multiple terms. |
|
Splits a list into smaller chunks of up to 10,000 elements. |
Safely sets the DataFrame index to 'id', ignoring errors. |
Classes
|
- pat2vec.pat2vec_search.cogstack_search_methods.create_credentials_file()[source]๏
Creates a template credentials.py file.
This function creates a credentials.py file three levels up from the current fileโs directory. This file contains placeholder variables for Elasticsearch connection details (hosts, username, password, api_key). It is intended to be filled out by the user with their actual credentials.
- Return type:
None
- class pat2vec.pat2vec_search.cogstack_search_methods.CogStack(hosts, username=None, password=None, api=True, api_key=None)[source]๏
Bases:
object
- Parameters:
hosts (List[str])
username (str | None)
password (str | None)
api (bool)
api_key (str | None)
- __init__(hosts, username=None, password=None, api=True, api_key=None)[source]๏
Initializes the CogStack client for Elasticsearch interaction.
- Parameters:
hosts (
List
[str
]) โ A list of CogStack host URLs.username (
Optional
[str
]) โ The username for basic authentication.password (
Optional
[str
]) โ The password for basic authentication.api (
bool
) โ If True, use API key authentication. Defaults to True.api_key (
Optional
[str
]) โ The API key for authentication.
- get_docs_generator(index, query, es_gen_size=800, request_timeout=300)[source]๏
Returns a generator that yields documents from an Elasticsearch search.
This method uses elasticsearch.helpers.scan to efficiently scroll through all results of a query.
- Parameters:
index (
List
[str
]) โ A list of Elasticsearch indices to search.query (
Dict
[str
,Any
]) โ The Elasticsearch query dictionary.es_gen_size (
int
) โ The number of documents to retrieve per shard in each scroll.request_timeout (
int
) โ The timeout in seconds for the request.
- Return type:
Generator
[Dict
[str
,Any
],None
,None
]- Returns:
A generator object that yields search hits.
- cogstack2df(query, index, column_headers=None, es_gen_size=800, request_timeout=300)[source]๏
Executes a search query and returns the results as a pandas DataFrame.
- Parameters:
query (
Dict
[str
,Any
]) โ The Elasticsearch query dictionary.index (
str
) โ The name of the index or a list of indices to search.column_headers (
Optional
[List
[str
]]) โ A specific list of columns for the DataFrame.es_gen_size (
int
) โ The number of documents per scroll request.request_timeout (
int
) โ The timeout in seconds for the request.
- Return type:
DataFrame
- Returns:
A pandas DataFrame containing the search results.
- pat2vec.pat2vec_search.cogstack_search_methods.list_chunker(entered_list)[source]๏
Splits a list into smaller chunks of up to 10,000 elements.
- Parameters:
entered_list (
List
[Any
]) โ The list to be split into chunks.- Return type:
List
[List
[Any
]]- Returns:
A list of lists, where each sublist is a chunk of the original list.
- pat2vec.pat2vec_search.cogstack_search_methods.dataframe_generator(list_of_dfs)[source]๏
A generator that yields DataFrames from a list of DataFrames.
- Return type:
Generator
[DataFrame
,None
,None
]- Parameters:
list_of_dfs (List[DataFrame])
- pat2vec.pat2vec_search.cogstack_search_methods.cohort_searcher_with_terms_and_search(index_name, fields_list, term_name, entered_list, search_string)[source]๏
Searches a cohort using a term filter and a query string.
- Parameters:
index_name (
str
) โ The name of the Elasticsearch index to search.fields_list (
List
[str
]) โ The list of fields to return from each document.term_name (
str
) โ The name of the field to use for the term-level filter.entered_list (
List
[str
]) โ The list of values to filter for in the term_name field.search_string (
str
) โ The query string to apply to the search.
- Return type:
DataFrame
- Returns:
A pandas DataFrame containing the search results.
- pat2vec.pat2vec_search.cogstack_search_methods.set_index_safe_wrapper(df)[source]๏
Safely sets the DataFrame index to โidโ, ignoring errors.
- Return type:
DataFrame
- Parameters:
df (DataFrame)
- pat2vec.pat2vec_search.cogstack_search_methods.cohort_searcher_with_terms_no_search(index_name, fields_list, term_name, entered_list)[source]๏
Searches a cohort using only a term-level filter.
- Parameters:
index_name (
str
) โ The name of the index to search.fields_list (
List
[str
]) โ A list of fields to return.term_name (
str
) โ The field to filter on.entered_list (
List
[str
]) โ The list of values to search for in the term_name field.
- Return type:
DataFrame
- Returns:
A pandas DataFrame containing the search results.
- pat2vec.pat2vec_search.cogstack_search_methods.cohort_searcher_no_terms(index_name, fields_list, search_string)[source]๏
Searches an index using only a query string.
- Parameters:
index_name (
str
) โ The name of the Elasticsearch index to search.fields_list (
List
[str
]) โ A list of fields to return.search_string (
str
) โ The query string to use for the search.
- Return type:
DataFrame
- Returns:
A pandas DataFrame containing the search results.
- pat2vec.pat2vec_search.cogstack_search_methods.cohort_searcher_no_terms_fuzzy(index_name, fields_list, search_string, method='fuzzy', fuzzy=2, slop=1)[source]๏
Searches an index using different query string methods.
- Parameters:
index_name (
str
) โ The name of the Elasticsearch index.fields_list (
List
[str
]) โ List of fields to retrieve.search_string (
str
) โ The search string to query.method (
str
) โ The search method (โfuzzyโ, โexactโ, or โphraseโ).fuzzy (
int
) โ The fuzziness level for fuzzy matching.slop (
int
) โ The slop value for phrase searches (word proximity).
- Return type:
DataFrame
- Returns:
A DataFrame containing the search results.
- pat2vec.pat2vec_search.cogstack_search_methods.iterative_multi_term_cohort_searcher_no_terms_fuzzy(terms_list, treatment_doc_filename, start_year, start_month, start_day, end_year, end_month, end_day, overwrite=True, debug=False, uuid_column_name='client_idcode', additional_filters=None, all_fields=False, method='fuzzy', fuzzy=2, slop=1)[source]๏
Iteratively searches for EPR documents matching multiple search terms.
- Parameters:
terms_list (
List
[str
]) โ The list of search terms to search for.treatment_doc_filename (
str
) โ The name of the file to store the results in.start_year (
str
) โ The start year of the date range.start_month (
str
) โ The start month of the date range.start_day (
str
) โ The start day of the date range.end_year (
str
) โ The end year of the date range.end_month (
str
) โ The end month of the date range.end_day (
str
) โ The end day of the date range.overwrite (
bool
) โ Whether to overwrite the existing file.debug (
bool
) โ Whether to print debug information.uuid_column_name (
str
) โ The name of the column containing the UUIDs.additional_filters (
Optional
[List
[str
]]) โ A list of additional filters to apply.all_fields (
bool
) โ Whether to retrieve all fields.method (
str
) โ The search method to use (โfuzzyโ, โexactโ, or โphraseโ).fuzzy (
int
) โ The fuzziness level for fuzzy matching.slop (
int
) โ The slop value for phrase searches.
- Return type:
DataFrame
- Returns:
A DataFrame containing the search results.
- pat2vec.pat2vec_search.cogstack_search_methods.iterative_multi_term_cohort_searcher_no_terms_fuzzy_mct(terms_list, treatment_doc_filename, start_year, start_month, start_day, end_year, end_month, end_day, append=True, debug=True, uuid_column_name='client_idcode', additional_filters=None, all_fields=False, method='fuzzy', fuzzy=2, slop=1, testing=False)[source]๏
Iteratively searches for MCT documents matching multiple search terms.
This function searches the โobservationsโ index for documents of type โAoMRC_ClinicalSummary_FTโ that contain the specified terms.
- Parameters:
terms_list (
List
[str
]) โ A list of terms to search for.treatment_doc_filename (
str
) โ The filename to load or save the results.start_year (
str
) โ The start of the date range.start_month (
str
) โ The start of the date range.start_day (
str
) โ The start of the date range.end_year (
str
) โ The end of the date range.end_month (
str
) โ The end of the date range.end_day (
str
) โ The end of the date range.append (
bool
) โ Whether to append results to an existing file.debug (
bool
) โ Whether to print debug information.uuid_column_name (
str
) โ The name of the UUID column.additional_filters (
Optional
[List
[str
]]) โ Additional filters to apply to the search.all_fields (
bool
) โ Whether to retrieve all fields.method (
str
) โ The search method (โfuzzyโ, โexactโ, โphraseโ).fuzzy (
int
) โ The fuzziness level for fuzzy search.slop (
int
) โ The slop value for phrase search.testing (
bool
) โ Whether to use a dummy searcher for testing.
- Return type:
DataFrame
- Returns:
A DataFrame containing the search results.
- pat2vec.pat2vec_search.cogstack_search_methods.initialize_cogstack_client(config_obj=None)[source]๏
Initializes the global CogStack client cs.
This function sets up the connection to Elasticsearch. It can be configured to load credentials from a specific file path by passing a config object. If a client instance already exists, it will not re-initialize unless a config object with a new credentials path is provided.
The credential loading priority is: 1. credentials_path from the config_obj. 2. Default credentials.py in the projectโs root. 3. If not found, it creates a template credentials.py and tries again. 4. Falls back to dummy credentials if all else fails.
- Parameters:
config_obj โ A configuration object that may have a credentials_path attribute.
- Returns:
The initialized CogStack client instance.
- pat2vec.pat2vec_search.cogstack_search_methods.iterative_multi_term_cohort_searcher_no_terms_fuzzy_textual_obs(terms_list, treatment_doc_filename, start_year, start_month, start_day, end_year, end_month, end_day, append=True, debug=True, uuid_column_name='client_idcode', bloods_time_field='basicobs_entered', additional_filters=None, all_fields=False, method='fuzzy', fuzzy=2, slop=1, testing=False)[source]๏
Iteratively searches for textual observations matching multiple terms.
This function searches the โbasic_observationsโ index for documents where the textualObs field contains the specified terms.
- Parameters:
terms_list (
List
[str
]) โ A list of terms to search for.treatment_doc_filename (
str
) โ The filename to load or save the results.start_year (
str
) โ The start of the date range.start_month (
str
) โ The start of the date range.start_day (
str
) โ The start of the date range.end_year (
str
) โ The end of the date range.end_month (
str
) โ The end of the date range.end_day (
str
) โ The end of the date range.append (
bool
) โ Whether to append results to an existing file.debug (
bool
) โ Whether to print debug information.uuid_column_name (
str
) โ The name of the UUID column.bloods_time_field (
str
) โ The timestamp field to use for date filtering.additional_filters (
Optional
[List
[str
]]) โ Additional filters to apply to the search.all_fields (
bool
) โ Whether to retrieve all fields.method (
str
) โ The search method (โfuzzyโ, โexactโ, โphraseโ).fuzzy (
int
) โ The fuzziness level for fuzzy search.slop (
int
) โ The slop value for phrase search.testing (
bool
) โ Whether to use a dummy searcher for testing.
- Return type:
DataFrame
- Returns:
A DataFrame containing the search results.