Large substrings ~9000x faster in Firefox than Chrome: why?
Asked Answered
G

2

20

The Benchmark: http://jsperf.com/substringing

So, I'm starting up my very first HTML5 browser-based client-side project. It's going to have to parse very, very large text files into, essentially, an array or arrays of objects. I know how I'm going to go about coding it; my primary concern right now is getting the parser code as fast as I can get it, and my primary testbed is Chrome. However, while looking at the differences between substring methods (I haven't touched JavaScript in a long, long time), I noticed that this benchmark was incredibly slow in Chrome compared to FireFox. Why?

My first assumption is that it has to do with the way FireFox's JS engine would handle string objects, and that for FireFox this operation is simple pointer manipulation, while for Chrome it's actually doing hard copies. But, I'm not sure why Chrome wouldn't do pointer manipulation or why FireFox would. Anyone have some insight?

JSPerf appears to be throwing out my FireFox results, not displaying them on the BrowserScope. For me, I'm getting 9,568,203 ±1.44% Ops/sec on .substr() in FF4.

Edit: So I see a FF3.5 performance result down there actually below Chrome. So I decided to test my pointers hypothesis. This brought me to a 2nd revision of my Substrings test, which is doing 1,092,718±1.62% Ops/sec in FF4 versus 1,195±3.81% Ops/sec in Chrome, down to only 1000x faster, but still an inexplicable difference in performance.

A postscriptum: No, I'm not concerned one lick about Internet Explorer. I'm concerned about trying to improve my skills and getting to know this language on a deeper level.

Gambrinus answered 31/5, 2011 at 22:53 Comment(11)
Link is at the very top, in a header. Or...?Gambrinus
I have no idea how it is implemented, but if you are parsing could you replace substringing with iteration over strings? AFAIK many parsing algorithms are more stream oriented.Gestalt
I am iterating over the string with an index, but I still have to extract substrings to get my data out of the file.Gambrinus
I'm seeing ~1.1k/s in Chrome 11 and ~250/s in FF3.6.Intoxicant
@Gambrinus get a copy of chrome 13 and see if that bottleneck dissapears.Indicate
Just a thought, if you have to substring and the end-result of this program has to run on user pc's with an arbitrarily sized dataset, are there betters ways to represent your data ? Perhaps XML or JSON ?Bousquet
@Indicate Someone tested with 13 on the jsperf page, seems to be about +10% versus 11, not +1000%. @Intoxicant Yeah, this disparity seems to be specific to FF4. @Russ C: The data is produced by a third-party program and is therefore static; I'll be using HTMl5's file APIs to read it once the User gives it to me.Gambrinus
You could try to parse it as a stream, while not using any substring type functions... If there is a marker at the start of a field, simply append each character into a variable until the end of the filed is reached. This variable would contain the whole field with never having to use a substring method. The appending of a character to the variable should be insanely fast by comparison.Subside
@Subside Probably because of the formatting of the file, but my implementation of your description of stream parsing is 91% slower than substrings and 80% slower than using string.split()s.Gambrinus
@Gambrinus I'm very surprised at that. I'm using code to parse out ANSI codes as I had described. It seems fast for me but my data sizes are not as large as yours, not even close. It may be worth while breaking your large dataset into smaller ones with an initial substring and then parsing each of them in turn.Subside
@Subside My data is majorly CSV-style formatted, and it can be broken down. If it were anywhere close to binary it would probably perform better, but my data is very sparse in its plaintext form (only about 15% of the filesize should be needed in memory at the end).Gambrinus
A
17

In the case of Spidermonkey (the JS engine in Firefox), a substring() call just creates a new "dependent string": a string object that stores a pointer to the thing it's a substring off and the start and end offsets. This is precisely to make substring() fast, and is an obvious optimization given immutable strings.

As for why V8 does not do that... A possibility is that V8 is trying to save space: in the dependent string setup if you hold on to the substring but forget the original string, the original string can't get GCed because the substring is using part of its string data.

In any case, I just looked at the V8 source, ans it looks like they just don't do any sort of dependent strings at all; the comments don't explain why they don't, though.

[Update, 12/2013]: A few months after I gave the above answer V8 added support for dependent strings, as Paul Draper points out.

Arleen answered 1/6, 2011 at 1:43 Comment(2)
This answer is incorrect that V8 does not have dependent strings. There is even a V8 bug about the fact that this can cause small strings to hold onto lots of memory.Sneer
V8 added dependent strings in August 2011 or so. See <codereview.chromium.org/7477045>. So the answer was in fact correct at the time it was given. If you try current Chrome on the testcases in the original question, it has totally different behavior than Chrome did at the time.Arleen
G
1

Have you eliminated the reading of .length from your benchmark results?

I believe V8 has a few representations of a string:

1. a sequence of ASCII bytes
2. a sequence of UTF-16 code units.
3. a slice of a string (result of substring)
4. a concatenation of two strings.

Number 4 is what makes string += efficient.

I'm just guessing but if they're trying to pack two string pointers and a length into a small space, they may not be able to cache large lengths with the pointers, so may end up walking the joined link list in order to compute the length. This assumes of course that Array.prototype.join creates strings of form (4) from the array parts.

It does lead to a testable hypothesis which would explain the discrepancy even absent buffer copies.

EDIT:

I looked through the V8 source code and StringBuilderConcat is where I would start pulling, especially runtime.cc.

Grillwork answered 1/6, 2011 at 1:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.