Data files sometimes requires preprocessing (convert to CSV, add column header, split), Python can be quite slow for some tasks, so Bash can be a good solution.
Can be helpful, especially for processing RML mappings.
RML use the tabular files columns header to map the data. If the tabular files to process don't have a header, it can easily be added by using the
sed command in the download.sh script.
In case you need to split large files:
Processing large files on node2 can lead to generating an important amount of logs which is overloading the memory. Logs generated in
To clear the memory perform
docker system prune