pat2vec.pat2vec_get_methods.get_method_smoking

Functions

calculate_smoking_features(features_data, ...)

Generates binary smoking status features from observation values.

get_smoking(current_pat_client_id_code, ...)

Retrieves CORE_SmokingStatus features for a patient within a date range.

prepare_smoking_data(raw_data)

Filters for valid CORE_SmokingStatus records and drops NAs.

search_smoking([...])

Searches for CORE_SmokingStatus observations.

pat2vec.pat2vec_get_methods.get_method_smoking.search_smoking(cohort_searcher_with_terms_and_search=None, client_id_codes=None, observations_time_field='observationdocument_recordeddtm', start_year='1995', start_month='01', start_day='01', end_year='2025', end_month='12', end_day='12', additional_custom_search_string=None, client_idcode_term_name='client_idcode.keyword')[source]

Searches for CORE_SmokingStatus observations.

Parameters:
  • cohort_searcher_with_terms_and_search (Optional[Callable]) – The function for cohort searching. Defaults to None.

  • client_id_codes (Optional[Union[str, List[str]]]) – The client ID code(s) of the patient(s). Defaults to None.

  • observations_time_field (str) – The timestamp field for filtering observations. Defaults to ‘observationdocument_recordeddtm’.

  • start_year (str) – Start year for the search. Defaults to ‘1995’.

  • start_month (str) – Start month for the search. Defaults to ‘01’.

  • start_day (str) – Start day for the search. Defaults to ‘01’.

  • end_year (str) – End year for the search. Defaults to ‘2025’.

  • end_month (str) – End month for the search. Defaults to ‘12’.

  • end_day (str) – End day for the search. Defaults to ‘12’.

  • additional_custom_search_string (Optional[str]) – An additional string to append to the search query. Defaults to None.

  • client_idcode_term_name (str) – The name of the client ID code field in the index. Defaults to “client_idcode.keyword”.

Returns:

A DataFrame containing the raw smoking status observation data.

Return type:

pd.DataFrame

Raises:

ValueError – If cohort_searcher_with_terms_and_search or client_id_codes is None.

pat2vec.pat2vec_get_methods.get_method_smoking.prepare_smoking_data(raw_data)[source]

Filters for valid CORE_SmokingStatus records and drops NAs.

Parameters:

raw_data (pd.DataFrame) – The raw observation data.

Returns:

A cleaned DataFrame containing only valid smoking status records.

Return type:

pd.DataFrame

pat2vec.pat2vec_get_methods.get_method_smoking.calculate_smoking_features(features_data, current_pat_client_id_code, negate_biochem=False)[source]

Generates binary smoking status features from observation values.

Creates binary flags indicating if a patient has records for being a ‘Current Smoker’ or ‘Non-Smoker’.

Parameters:
  • features_data (pd.DataFrame) – The prepared smoking status data.

  • current_pat_client_id_code (str) – The patient’s client ID.

  • negate_biochem (bool) – If True, returns features with NaN values when no data is available. Defaults to False.

Returns:

A single-row DataFrame with binary features for smoking status.

Return type:

pd.DataFrame

pat2vec.pat2vec_get_methods.get_method_smoking.get_smoking(current_pat_client_id_code, target_date_range, pat_batch, config_obj=None, cohort_searcher_with_terms_and_search=None)[source]

Retrieves CORE_SmokingStatus features for a patient within a date range.

This function fetches smoking status observation data, either from a pre-loaded batch or by searching, and then creates binary features indicating the presence of records for different smoking statuses.

Parameters:
  • current_pat_client_id_code (str) – The client ID code of the patient.

  • target_date_range (Tuple) – A tuple representing the target date range.

  • pat_batch (pd.DataFrame) – The DataFrame containing patient data for batch mode.

  • config_obj (Optional[object]) – Configuration object with settings like batch_mode and negate_biochem. Defaults to None.

  • cohort_searcher_with_terms_and_search (Optional[Callable]) – The function for cohort searching. Defaults to None.

Returns:

A DataFrame containing smoking status features for the patient.

Return type:

pd.DataFrame

Raises:

ValueError – If config_obj is None.