1. astrogen_utils module

This module contains useful functions to manipulate strings for name matching, pretty printing, gender detection and XLSX output.

The functions in this module comprise:

string comparisson:
- ds
- ds1
- ds2
- initials
- getinitials
- pickone
input/output:
- bcolors
- append_df_to_excel

astrogen_utils.append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, truncate_sheet=False, **to_excel_kwargs)[source]

Append a DataFrame [df] to existing Excel file [filename] into [sheet_name] Sheet. If [filename] doesn’t exist, then this function will create it.

@param filename: File path or existing ExcelWriter (Example: ‘/path/to/file.xlsx’) @param df: DataFrame to save to workbook @param sheet_name: Name of sheet which will contain DataFrame. (default: ‘Sheet1’) @param startrow: upper left cell row to dump data frame. Per default (startrow=None) calculate the last row in the existing DF and write to the next row… @param truncate_sheet: truncate (remove and recreate) [sheet_name] before writing DataFrame to Excel file @param to_excel_kwargs: arguments which will be passed to DataFrame.to_excel() [can be a dictionary] @return: None

Usage examples:

>>> append_df_to_excel('d:/temp/test.xlsx', df)

>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                       index=False)

>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                       index=False, startrow=25)

[MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)

class astrogen_utils.bcolors[source]

Bases: object

Get color palette for pretty printing

This class simply contains a list of predefined colors to be used in the visual analysis of strings and publication data.

BOLD = '\x1b[1m'

ENDC = '\x1b[0m'

FAIL = '\x1b[91m'

HEADER = '\x1b[95m'

OKBLUE = '\x1b[94m'

OKCYAN = '\x1b[96m'

OKGREEN = '\x1b[92m'

TST = '\x1b[31;1m'

UNDERLINE = '\x1b[4m'

WARNING = '\x1b[93m'

X = '\x1b[4;95;1m'

astrogen_utils.clean_text(txt)[source]

astrogen_utils.df_to_dict(df, key_column, val_column)[source]: convierte dos pandas series en un diccionario

astrogen_utils.ds(a, b)[source]

Get distance between two words.

This function is used to obtain the distance between two names or surnames. Uses different distances in word space, namely, Damerau Levenshtein distance, Jaro distance, Levenstein distance and SequenceMatcher. The later from the difflib package and the other ones from the Jellyfish package.

Args:

a (string): one of the strings b (string): the other string to compare

Returns:

res (array): Numpy array with the list of distances: between the two words.

astrogen_utils.ds1(s1, s2)[source]

Get distance between two words.

This function is used to obtain the distance between two names or surnames. Uses different distances in word space, namely, Damerau Levenshtein distance, Jaro distance, Levenstein distance and SequenceMatcher. The later from the difflib package and the other ones from the Jellyfish package.

Args:

a (string): one of the strings b (string): the other string to compare

Returns:

res (array): Numpy array with the list of distances: between the two words.

astrogen_utils.ds2(ap1, ap2, nom1, nom2)[source]

Get distance between two words.

This function is used to obtain the distance between two names or surnames. Uses different distances in word space, namely, Damerau Levenshtein distance, Jaro distance, Levenstein distance and SequenceMatcher. The later from the difflib package and the other ones from the Jellyfish package.

Args:

a (string): one of the strings b (string): the other string to compare

Returns:

res (array): Numpy array with the list of distances: between the two words.

astrogen_utils.fnames(auth, folder, extension, include_path=True)[source]: build the file name

astrogen_utils.get_gender2(names)[source]

astrogen_utils.getinitials(nombre)[source]

Get the initials of a full name

e.g.: ‘Jose Facundo’ –> ‘J. F.’

Args:

Returns:

astrogen_utils.getinitialscompact(nombre)[source]

Get the initials of a full name

e.g.: ‘Jose Facundo’ –> ‘JF’

Args:

Returns:

astrogen_utils.initials(initials, string)[source]

Check if the initials of two names coincide.

e.g.:

initials = ‘Juan Carlos’; string=’Juan’ –> True

initials = ‘Juan Carlos’; string=’Juan José’ –> False

initials = ‘Juan Carlos’; string=’Jacinto’ –> True

Args:: initials (string): source string for the initials string (string): full names
Returns:: boo (bool): whether the initials are accepted

Notes:

The criteria for the string matching is the following:

astrogen_utils.pickone(df, au, sift)[source]: de una lista de autores en un dataframe “df” elige el que está más cerca de un autor “au” y devuelve un array booleano que es todo falso savo uno (el autor elegido).

astrogen_utils.similar(a, b)[source]