What is the maximum number of characters that the ksh variable accepts?
Asked Answered
D

2

3

I am trying to load and parse a really large text file. Although the loading is not a problem, but there are particular lines that have 2908778 characters on a single line.

This is causing an error in my script.

On the script below, I removed all logic and just got straight to read line. I also removed all valid lines and just left the really long line in one text file. When running I get the below error :

$ dowhiledebug.sh dump.txt
dowhiledebug.sh[6]: no space
Script Ended dump.txt

The actual script:

 #!/bin/sh
 filename=$1
 count=1
 if [ -f ${filename} ]; then
    echo "after then"
    while read line;
            do
            echo "$count"
            count=$((count+1))
            done < $filename
 else
    echo "Could not open file $filename"
 fi
 echo "Script Ended $filename"

Updated (2013-01-17)

Follow up question : Is it possible to increase the maximum number of characters that ksh variable accepts?

Dorcy answered 17/1, 2013 at 2:36 Comment(10)
Why not use wc -l to count the lines in the file? It won't have the limits of the shell. I guess the answer is "because I need to do other processing which I've removed for the reproduction".Mountainside
On my Mac with bash 3.2, I created a file 4194304 characters and no newline at all, and then bash ignored the line altogether. I appended a single newline, and bash was quite happy to read the whole lot into memory. So, your size is not a hard limit. You'll need to look at how much memory there is on your system (more than 3 MiB, I'm sure), and whether the shell has many huge variables using up memory.Mountainside
And, FWIW, the sysconf value for ARG_MAX on the machine is 256 KiB, just as in the answer. I don't think the limit is directly related to ARG_MAX (though, I confess, I'm mildly surprised that I was able to echo a 4 MiB string to wc). This is on the Mac, still.Mountainside
what OS and version of ksh? Can you echo ${.sh.version} and get a value? If so, please include in your question above. Or could this be pdksh? Good luck to all.Ashwin
@shelter AIX $ Version M-11/16/88fDorcy
Well don't do that then! Find some solution for your problem that doesn't require you to load the entire line into a ksh variable.Vexation
@user1985725 : ver 88/f That's part of your problem. See if you SysAdmin can point you to a newer version. If you're running on AIX, you're likely not in an environment where you can have someone build a one-off with larger ARG_MAX. It's likely time to consider another language, or analyse your processing to find another solution. Good luck.Ashwin
I would use python or some other scripting language for that purpose.Apotheosize
What is the maximum size of an environment variable value?Adamant
@JonathanLeffler: The reason you were able to echo such a large string is that echo is a builtin in Bash, in which case ARG_MAX doesn't apply. Compare echo "$(awk 'BEGIN { while (c++ < '"$(( $(getconf ARG_MAX) + 1 ))"') printf "=" }')" to /bin/echo "$(awk 'BEGIN { while (c++ < '"$(( $(getconf ARG_MAX) + 1 ))"') printf "=" }')". That said, I too think that ARG_MAX is not relevant to the OP's problem.Skill
A
1

The limit for any shell is the limit of the C command line maximum. Here's a little program that pulls the information out of /usr/include/limits.h for you:

cpp <<HERE | tail -1
#include <limits.h>
ARG_MAX
HERE

Mine gives me (256 * 1024) or 262144 characters.

Doesn't work if the C compiler isn't installed, but it's probably a similar limit.

Ambitious answered 17/1, 2013 at 3:7 Comment(1)
While it's good to know the max. length of a command line when invoking an external utility (which can more easily be obtained with getconf ARG_MAX), this limit does not apply here, because the data is read from a file, not passed on the command line; also, read is a shell builtin, not an external utility. The following ksh command reads a line that is 1 byte larger than getconf ARG_MAX, which succeeds: read line < <(awk 'BEGIN { while (c++ < '"$(( $(getconf ARG_MAX) + 1 ))"') printf "=" }'); echo "${#line}".Skill
A
6

what OS and version of ksh? Can you echo ${.sh.version} and get a value? If so, please include in your question above. Or could this be pdksh?

Here's a test that will get you in the ballpark, assuming a modern ksh supporting (( i++ )) math evaluations:

#100 char var
var=1234578901234456789012345678901234567890123456789012345789012344567890123456789012345678901234567890

$ while (( i++ < 10000 )) ;do  var="$var$var" ; print "i=$i\t" ${#var} ; done
i=1      200
i=2      400
i=3      800
i=4      1600
i=5      3200
i=6      6400
i=7      12800
i=8      25600
i=9      51200
i=10     102400
i=11     204800
i=12     409600
i=13     819200
i=14     1638400
i=15     3276800
i=16     6553600
i=17     13107200
i=18     26214400
i=19     52428800
i=20     104857600
i=21     209715200
i=22     419430400
-ksh: out of memory

$ print -- ${.sh.version}
Version JM 93t+ 2010-05-24

AND that is just the overall size of the environment that can be supported. When dealing with the command-line environment and "words" after the program name, there is a limit to the number of words, regardless of overall size.

Some shells man page will have a section LIMITS that may show something like max-bytes 200MB, max-args 2048. This information may be in a different section, it will definitely have different labels and different values I have included, OR it may not be there at all, hence the above code loop, so look carefully around and if you find a source for this info, either add an answer to this Q, or update this one.

The bash 4.4 std man page doesn't seem to have this information and its harder to find a ksh doc all the time. Check your man ksh and hope that you can find a documented limit.

IHTH

Ashwin answered 17/1, 2013 at 4:23 Comment(1)
Hi shelter, assuming that there is a limit. Is there a way to extend t this limit? Because I'm sure that the records I'm loading does not stop at 2M characters per line? Is there an alternative to "while read line"?Dorcy
A
1

The limit for any shell is the limit of the C command line maximum. Here's a little program that pulls the information out of /usr/include/limits.h for you:

cpp <<HERE | tail -1
#include <limits.h>
ARG_MAX
HERE

Mine gives me (256 * 1024) or 262144 characters.

Doesn't work if the C compiler isn't installed, but it's probably a similar limit.

Ambitious answered 17/1, 2013 at 3:7 Comment(1)
While it's good to know the max. length of a command line when invoking an external utility (which can more easily be obtained with getconf ARG_MAX), this limit does not apply here, because the data is read from a file, not passed on the command line; also, read is a shell builtin, not an external utility. The following ksh command reads a line that is 1 byte larger than getconf ARG_MAX, which succeeds: read line < <(awk 'BEGIN { while (c++ < '"$(( $(getconf ARG_MAX) + 1 ))"') printf "=" }'); echo "${#line}".Skill

© 2022 - 2024 — McMap. All rights reserved.