Preprocess input files
Data files sometimes requires preprocessing (convert to CSV, add column header, split), Python can be quite slow for some tasks, so Bash can be a good solution.
Convert TSV to CSV#
Can be helpful, especially for processing RML mappings.
Add Tabular file header label#
RML use the tabular files columns header to map the data. If the tabular files to process don't have a header, it can easily be added by using the sed command in the download.sh script.
CSV#
TSV#
PSV#
Split big files#
In case you need to split large files:
Processing large files on node2 can lead to generating an important amount of logs which is overloading the memory. Logs generated in /var/lib/docker/overlay2
To clear the memory perform docker system prune