Preprocess input files
Data files sometimes requires preprocessing (convert to CSV, add column header, split), Python can be quite slow for some tasks, so Bash can be a good solution.
#
Convert TSV to CSVCan be helpful, especially for processing RML mappings.
#
Add Tabular file header labelRML use the tabular files columns header to map the data. If the tabular files to process don't have a header, it can easily be added by using the sed
command in the download.sh script.
#
CSV#
TSV#
PSV#
Split big filesIn case you need to split large files:
Processing large files on node2 can lead to generating an important amount of logs which is overloading the memory. Logs generated in /var/lib/docker/overlay2
To clear the memory perform docker system prune