Part-of-Speech and Named Entity Recognition annotations in MCSQ

annotation.ner_annotation_cze_rus.ner_annotation(df, ner)[source]

Iterates through the preprocessed and POS tag annotated RUS and CZE spreadsheets, adding the NER annotation. POS tag is done in the mcsq_annotation script. CZE and RUS languages use multilingual pretrained model provided by Deeppavlov.

The Slavic-BERT-NER from Deeppavlov uses lib versions that are imcompatible with the ones from the mcsq_annotation script, therefore this script should be run using a separate virtual environment.

Parameters:
  • df (param1) – the dataframe that holds the preprocessed and POS tag annotated questionnaire.
  • ner (param2) – pretrained NER model provided by Deeppavlov.
Returns:

df_tagged (pandas dataframe), the questionnaire with added NER annotations.

annotation.wis_annotated_text_to_alignment.add_annotation(df_source, df_target, df_alignment)[source]

Adds NER/POS annotations in the alignment files by copying the annotations from the spreadsheets. Differently from the EVS, ESS and SHARE files, all the WIS files have 1-1 correspondences and come prealigned, therefore these files do not have to go through the Alignment algorithm.

Parameters:
  • df_source (param1) – the dataframe that holds the preprocessed annotated source questionnaire.
  • df_target (param2) – the dataframe that holds the preprocessed annotated target questionnaire.
  • df_alignment (param3) – the dataframe that holds the alignment questionnaire, without annotations.
Returns:

df_alignment (pandas dataframe) with added NER and POS annotations that were copied from the df_source and df_target.