Variable Length Records
Variable length records (rows) end in a new line. Most text based files are of this type and it is the default format for Funnel. A blank line is considered to be an empty row and all sort keys on that row will be the lowest possible value for the key: they will sort first in the output file.
The end-of-line character, or sequence of characters, is different for various operating systems. On Windows it is a carriage-return (CR) and a line-feed (LF). For *nix systems it is simply the LF. Funnel determines which one to use depending on the system Funnel is running on. So it if runs on Linux it will assume the rows of a variable length file end in a LF. And it Funnel is running on Windows it will assume the rows of a variable length file end in CR and a LF.
Funnel however gives you complete control over the end-of-line delimiters with the --eol parameter. You can also change the end-of-line delimiters on the output file by using the --eolOut parameter.
One of the issues with variable length file is that the number of records can not be known ahead of time. They have to be read completely before that can be known. And that is too late in order to efficiently configure the Funnel algorithms. That is why the –maxrows # parameter exists. If you can provide this parameter then Funnel can optimize its performance based on it. Otherwise Funnel makes some worst case scenario assumptions so that things don’t fail miserably, they just run a little longer.
