Is there a shlex alternative for Java? I'd like to be able to split quote delimited strings like the shell would process them. For example, if I'd send :
one two "three four"and perform a split, I'd like to receive the tokens
one
two
three four
Is there a shlex alternative for Java? I'd like to be able to split quote delimited strings like the shell would process them. For example, if I'd send :
one two "three four"and perform a split, I'd like to receive the tokens
one
two
three four
I had a similar problem today, and it didn't look like any standard options such as StringTokenizer, StrTokenizer, Scanner were a good fit. However, it's not hard to implement the basics.
This example handles all the edge cases currently commented on other answers. Be warned, I haven't checked it for full POSIX compliance yet. Gist including unit tests available on GitHub - released in public domain via the unlicense.
public List<String> shellSplit(CharSequence string) {
List<String> tokens = new ArrayList<String>();
boolean escaping = false;
char quoteChar = ' ';
boolean quoting = false;
int lastCloseQuoteIndex = Integer.MIN_VALUE;
StringBuilder current = new StringBuilder();
for (int i = 0; i<string.length(); i++) {
char c = string.charAt(i);
if (escaping) {
current.append(c);
escaping = false;
} else if (c == '\\' && !(quoting && quoteChar == '\'')) {
escaping = true;
} else if (quoting && c == quoteChar) {
quoting = false;
lastCloseQuoteIndex = i;
} else if (!quoting && (c == '\'' || c == '"')) {
quoting = true;
quoteChar = c;
} else if (!quoting && Character.isWhitespace(c)) {
if (current.length() > 0 || lastCloseQuoteIndex == (i - 1)) {
tokens.add(current.toString());
current = new StringBuilder();
}
} else {
current.append(c);
}
}
if (current.length() > 0 || lastCloseQuoteIndex == (string.length() - 1)) {
tokens.add(current.toString());
}
return tokens;
}
"''"
will get parsed to an empty list rather than a list containing ""
. –
Wes Look at Apache Commons Lang:
org.apache.commons.lang.text.StrTokenizer should be able to do what you want:
new StringTokenizer("one two \"three four\"", ' ', '"').getTokenArray();
shlex
, commons.lang is not POSIX compatible. (-> (StrTokenizer. "\"foo\"'bar'baz") (.getTokenList))
returns a single entry containing "foo"'bar'baz
, as opposed to the (correct) foobarbaz
. –
Kagu I had success using the following Scala code using fastparse. I can't vouch for it being complete:
val kvParser = {
import fastparse._
import NoWhitespace._
def nonQuoteChar[_:P] = P(CharPred(_ != '"'))
def quotedQuote[_:P] = P("\\\"")
def quotedElement[_:P] = P(nonQuoteChar | quotedQuote)
def quotedContent[_:P] = P(quotedElement.rep)
def quotedString[_:P] = P("\"" ~/ quotedContent.! ~ "\"")
def alpha[_:P] = P(CharIn("a-zA-Z"))
def digit[_:P] = P(CharIn("0-9"))
def hyphen[_:P] = P("-")
def underscore[_:P] = P("_")
def bareStringChar[_:P] = P(alpha | digit | hyphen | underscore)
def bareString[_:P] = P(bareStringChar.rep.!)
def string[_:P] = P(quotedString | bareString)
def kvPair[_:P] = P(string ~ "=" ~ string)
def commaAndSpace[_:P] = P(CharIn(" \t\n\r").rep ~ "," ~ CharIn(" \t\n\r").rep)
def kvPairList[_:P] = P(kvPair.rep(sep = commaAndSpace))
def fullLang[_:P] = P(kvPairList ~ End)
def res(str: String) = {
parse(str, fullLang(_))
}
res _
}
© 2022 - 2024 — McMap. All rights reserved.
shlex
does it well, but many naive algorithms won't. For instance, in shell,"three four"
and"three"' 'four
are exactly equivalent, as isthree\ four
. – Kagu