2. pipeline module
The data reduction pipeline is implemented through a bonobo pipeline, within an ETL (extract-transform-load) model, with the following steps:
action |
routine |
---|---|
read table from AAA |
|
merge with tables from institutes |
|
merge with data from CONICET |
|
add gender |
|
add age |
|
clean papers |
|
add journal index |
|
add publication metrics |
|
visual check |
|
anonymize |
|
In what follows we describe each step separately.
The module pipeline()
contains the steps fot the data reduction pipeline.
The steps are
S01: read base table (AAA)
- S02: add institutes and cic data
In these steps the following columns are added:
cic
docencia
area
orcid
use_orcid
The steps are contained in the following functions:
pipeline.S02_add_OAC_data()
pipeline.S02_add_IATE_data()
pipeline.S02_add_IALP_data()
pipeline.S02_add_ICATE_data()
pipeline.S02_add_GAE_data()
pipeline.S02_add_CIC_data()
- S03: add metadata for authors
S03_add_gender
S03_add_age
S03_clean_and_sor
- S04: add publications data
pipeline.S04_pub_get_ads_entries()
pipeline.S04_pub_get_orcids()
pipeline.S04_pub_journal_index()
pipeline.S04_pub_clean_papers()
pipeline.S04_make_pages()
pipeline.S04_pub_value_added()
API documentation for the code in pipeline.py: