Thursday, November 17, 2011

Gnuplot: splot data rows conditionally depending upon a column value

Gnuplot will easily plot data from a file having several rows but only using columns of data specified. A command like this:

splot "data.csv" using 2:3:4 with lines

...will plot data from the file using X,Y,Z values from columns 2,3,4 and ignoring all others. Sometimes however, it's helpful to plot rows based upon a condition within the data for a column, and it's easy to perform simple numeric comparisons on column values:

splot "data.csv" using ($5==1?$2:NaN):3:4 with lines

Note: the value for X (of the X,Y,Z coord) being NaN (not a number) causes gnuplot to ignore the row.

This will result in gnuplot only plotting the values for rows where the numeric value in column 5 is equal to 1 (or 1.00000 in the case I was using).

Similarly, one recent need was to ignore data (rows) if a column value contained a '?' character in the 16'th position. The data looks like:

291:20:04:59.410,-7872470.049126,-15084862.45306,12185551.690250
291:20:04:59.51?,-7872470.049126,-15084862.45306,12185551.690250

The '?' character means the data is "bad" and should be ignored (probably?) so when the plot is done, that row should be ignored. Gnuplot has a "substr" command that will extract a substring from the value for comparison, however the column is numeric and it must be converted into a string first via "strcol" and since it is a string the "eq" or "ne" operator must be used (instead of "==" or "!="). The command is like so:

splot "data.txt" using ((substr(strcol(1),16,16) ne "?") ? $2 : NaN):3:4 with lines

...will plot only the rows where the 16th character of column 1 doesn't conain a '?' character.