Random .csv column order

Ciao @k-dominik (and Vale) hope you are both good!

Just a quick one! Still using Ilastik- running it through a cluster so we are able to get a lot of data quickly now :wink:

I just noticed though that the order of column output (Object Classification) is not consistent. We want to via the terminal select the columns of interest (omitting the default required columns) for easier loading in my downstream R analysis.
below is the bash code we use, with it we select the columns we want by indexing.
cat OC*csv | grep -e ,cell, -e object_id | cut -d , -f 2,3,5,18,22 > OCsimple_$output_name.csv

our output has not been consistent though. Same index gives different columns outputs.



Hello @alexjov!

great that ilastik is still helping you :slight_smile:

To be honest I’m a bit surprised that order of columns changes, and haven’t found the line of code that actually causes it.

I don’t think I could cook up a good shell script to extract columns by name. I’d probably prefer to venture into Python for it. As you are using R, why not import the CSV into a dataframe there and extract the columns by name? That is for sure more robust (and this is what we try to preserve through the versions of ilastik, that column_names don’t change).

There is also csvkit which I haven’t used :). But it seems to allow selecting subsets of columns by name (which would also make your script a bit more verbose):

csvcut -c column_a,column_c data.csv > new.csv

I know that wasn’t particularly helpful…


Ciao Dominik,

thanks for your quick reply!

Yes Ive been using R and selectedcolumns by name but to make it less of a burden when I load the data (locally, have thousands of csv files) we thought we could remove even the default columns that are exported that we don’t need in our analysis. Helps code with a more real data before we go on the cluster.

But its fine we will find a solution (trying csvkit now) should work out.

Thanks anyways and have a good day :slight_smile:


