1. astrogen_utils module
This module contains useful functions to manipulate strings for name matching, pretty printing, gender detection and XLSX output.
The functions in this module comprise:
- string comparisson:
ds
ds1
ds2
initials
getinitials
pickone
- input/output:
bcolors
append_df_to_excel
- astrogen_utils.append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, truncate_sheet=False, **to_excel_kwargs)[source]
Append a DataFrame [df] to existing Excel file [filename] into [sheet_name] Sheet. If [filename] doesn’t exist, then this function will create it.
@param filename: File path or existing ExcelWriter (Example: ‘/path/to/file.xlsx’) @param df: DataFrame to save to workbook @param sheet_name: Name of sheet which will contain DataFrame. (default: ‘Sheet1’) @param startrow: upper left cell row to dump data frame. Per default (startrow=None) calculate the last row in the existing DF and write to the next row… @param truncate_sheet: truncate (remove and recreate) [sheet_name] before writing DataFrame to Excel file @param to_excel_kwargs: arguments which will be passed to DataFrame.to_excel() [can be a dictionary] @return: None
Usage examples:
>>> append_df_to_excel('d:/temp/test.xlsx', df)
>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25)
- class astrogen_utils.bcolors[source]
Bases:
object
Get color palette for pretty printing
This class simply contains a list of predefined colors to be used in the visual analysis of strings and publication data.
- BOLD = '\x1b[1m'
- ENDC = '\x1b[0m'
- FAIL = '\x1b[91m'
- HEADER = '\x1b[95m'
- OKBLUE = '\x1b[94m'
- OKCYAN = '\x1b[96m'
- OKGREEN = '\x1b[92m'
- TST = '\x1b[31;1m'
- UNDERLINE = '\x1b[4m'
- WARNING = '\x1b[93m'
- X = '\x1b[4;95;1m'
- astrogen_utils.df_to_dict(df, key_column, val_column)[source]
convierte dos pandas series en un diccionario
- astrogen_utils.ds(a, b)[source]
Get distance between two words.
This function is used to obtain the distance between two names or surnames. Uses different distances in word space, namely, Damerau Levenshtein distance, Jaro distance, Levenstein distance and SequenceMatcher. The later from the difflib package and the other ones from the Jellyfish package.
- Args:
a (string): one of the strings b (string): the other string to compare
- Returns:
- res (array): Numpy array with the list of distances
between the two words.
- astrogen_utils.ds1(s1, s2)[source]
Get distance between two words.
This function is used to obtain the distance between two names or surnames. Uses different distances in word space, namely, Damerau Levenshtein distance, Jaro distance, Levenstein distance and SequenceMatcher. The later from the difflib package and the other ones from the Jellyfish package.
- Args:
a (string): one of the strings b (string): the other string to compare
- Returns:
- res (array): Numpy array with the list of distances
between the two words.
- astrogen_utils.ds2(ap1, ap2, nom1, nom2)[source]
Get distance between two words.
This function is used to obtain the distance between two names or surnames. Uses different distances in word space, namely, Damerau Levenshtein distance, Jaro distance, Levenstein distance and SequenceMatcher. The later from the difflib package and the other ones from the Jellyfish package.
- Args:
a (string): one of the strings b (string): the other string to compare
- Returns:
- res (array): Numpy array with the list of distances
between the two words.
- astrogen_utils.getinitials(nombre)[source]
Get the initials of a full name
e.g.: ‘Jose Facundo’ –> ‘J. F.’
Args:
Returns:
- astrogen_utils.getinitialscompact(nombre)[source]
Get the initials of a full name
e.g.: ‘Jose Facundo’ –> ‘JF’
Args:
Returns:
- astrogen_utils.initials(initials, string)[source]
Check if the initials of two names coincide.
e.g.:
initials = ‘Juan Carlos’; string=’Juan’ –> True
initials = ‘Juan Carlos’; string=’Juan José’ –> False
initials = ‘Juan Carlos’; string=’Jacinto’ –> True
- Args:
initials (string): source string for the initials string (string): full names
- Returns:
boo (bool): whether the initials are accepted
Notes:
The criteria for the string matching is the following: