Javascript and Scientific Processing? [closed]
Asked Answered
M

3

26

Matlab, R, and Python are powerful but either costly or slow for some data mining work I'd like to do. I'm considering using Javascript both for speed, good visualization libraries, and to be able to use the browser as an interface.

The first question I faced is the obvious one for science programming, how to do I/O to data files? The second is client-side or server-side? The last question, can I make something that is truly portable i.e. put it all on a USB and run from that?

I've spent a couple of weeks looking for answers. Server2go seems to address client/server needs which I think means I can get data to and from the programs on the client side. Server2go also allows running from a USB. The data files I work with are usually XML and there seem to be several javascript converters to JSON.

However, after all the looking around, I'm not sure if my approach makes sense. So before I commit further, any advice/thoughts/guidance on Javascript as a portable tool for scientific data processing?

Mccluskey answered 25/7, 2012 at 13:40 Comment(6)
I warmly suggest not to use Javascript for scientific processing. It lacks in math libraries, memory management, strong typing.Markham
I agree, there's pretty bad support for doing maths to a scientific standard.Overcloud
@larsmans I viewed the benchmarks only from the POV of how well javascript did against C++. While Python is not as slow as suggested, it is still much slower.Mccluskey
@ ADC I think the worries about memory management and typing are not show stoppers; lots of languages have an issue eg. Matlab :) I find it perplexing that such an increasingly ubiquitous language hasn't had more scientific support. The more I study it, the more powerful it appears to be. That said, I have found libraries for matrix math, physics, statistics, signal processing, machine learning so it seems promising.Mccluskey
Are you sure that JavaScript has comparable speed? Especially when it comes to math and matrix operations, it is fairly hard to beat anything that can use Fortran libraries like R and numpy do.Eyewitness
@MikeB: I've got in contact with the Julia developers about the Python benchmark. Whether the comparison of JavaScript vs. C++ is fair depends on the algorithms you want to run; if they spend much time in matrix multiplication, you're screwed.Pessa
U
12

I have to agree with the comments that JavaScript is not a good fit for scientific processing. However, you know your requirements best; maybe you already found useful libraries that do what you need. Just be aware that you'll have to implement all logic yourself. There is no built in handling of complex numbers, or matrices or integrals or ... Usually programmer time is far more valuable than machine time. Personally, I'd look in to compiled languages; after I created a first version that isn't fast enough in whatever language I like the most.

Assuming that JavaScript is the way to go:

Data I/O

I can think of three options:

Sending and receiving data with ajax to a server

Seems to be the solution you've found with Server2go. It requires you to write a server back end, but that can be kept quite simple. All it really needs to do be able to read and write files as a response to you client-side application.

Using a non-browser implementation of v8 which includes file I/O

For instance Node.js. You could then avoid the need for a server and simply use a command-line interface, and all code will be JavaScript. Other than that it is roughly equivalent to the first option.

Creating a file object using the file API which you ask the user to save or load

It is the worst option in my opinion, as user interaction is required. It would avoid the need for a server; your application could be a simple html file that loads all data files with ajax requests. You'd have to start Chrome with a special switch to allow ajax requests with the file:// protocol, as described here

These options are only concerned with file I/O and you can't do file I/O in JavaScript. This is because browsers cannot allow arbitrary web code to do arbitrary file I/O; the security implications would be horrendous. Each option describes one way to not do file I/O.

The first communicates with a server that does the file I/O for the client.

The second uses "special" versions of JavaScript, with conditions other than that of the browser so the security implications are not important. But that means you'll have to look up how file I/O is done in the actual implementation you use, it's not common to JavaScript.

The third requires the user to control the file I/O.

Interface

Even if you don't use JavaScript to do the actual processing, which so far is the consensus, there is nothing stopping you from using a browser as the interface or JavaScript libraries for visualisation. That is something JavaScript is good at.

If you want to interactively control your data mining tool, you will need a server that can control the tool. Server2go should work, or the built in server in Node.js if you use that or... If you don't need interactive control of the data tool; that is you first generate the processed data, then look at the data a server can be avoided, by using the file//: protocol and JSONP. But really; avoiding a server shouldn't be a goal.

I won't go into detail about interface issues, as there is nothing specific to say and very nearly everything that has been written about javascript is about interface.

One thing, do use a declarative data binding library like Angular.js or Knockout.js.

Ulund answered 25/7, 2012 at 14:57 Comment(7)
You should add a note that also Node.js requires a clientside front-end. Surely, processing the data serverside is an advantage.Numeral
@Numeral I haven't actually used Node.js . I assumed it would provide some kind of command-line interface, which would mean that there is no need for a clientside interface. Unless you actually want to see your output...Ulund
Yes, as far as I know (never used it myself) it does, but the OP talked about visualisation libraries which I'm quite sure are for clientside usage...Numeral
@Bergi: Node does provide built-in REPL module, and you can use optimist for getopt-style command-line arguments. As far as visualization, you don't have to use a browser. node-canvas lets you use canvas drawing methods in Node and output directly to a png. There are lots of visualization/charting libraries that you can plug in to canvas.Urtication
@Ulund Thank you for the response. I think the file I/O is the item that gives me the most confusion. I have gotten server2go working but that isn't the same as understanding what is actually happening. Node.js remains a bit of a mystery to me and I don't think it can be run from a USB key.Mccluskey
@Mccluskey I updated the answer with a bit more about why file I/O is as it is. As far as Node.js on a stick; I don't know but the installation page describes a manual install on windows as putting an exe somewhere and then you're done. No reason why "somewhere" couldn't be an usb stick. For 'nix and mac you'll probably have to compile it, but after that the same "one executable and you're done" should apply.Ulund
Definitely run some experiments to compare the performance of your javascript app to some other programs. It's easy to make and lose some orders of performance!Eyewitness
E
11

JavaScript speed is heavily overrated. This is a Web 2.0 myth.

Let me explain this claim a bit (and don't just downvote me for saying something you do not want to hear!)

Sure, JavaScript V8 is a quite highly optimized VM. It does beat many other scripting languages in naive benchmarks.

However, it is a very limited scope language. It is meant for the "ADHS world" of web. It is a best effort, but it may just fail and you have little guarantees on things completing or completing on time.

Consider for example MongoDB. At first it seems to be good and fast and offer a lot. Until you see for example that the MapReduce is single-threaded only and thus really slow. It's not all gold that shines!

Now look at data mining relevant libraries such as BLAS. Basic linear algebra, math operations and such. All CPU manufacturers like Intel and AMD offer optimized versions for their CPUs. This is an optimization that requires detailed understanding of the individual CPUs, way beyond the capabilities of our current compilers. The libraries contain optimized codepaths for various CPUs all essentially doing the same thing. And for these operations, using an optimized library such as BLAS can easily yield a 5-20x speedup; at the same time matrix operations that are often in O(n^2) or O(n^3) will dominate your overall runtime.

So a good language for data mining will let you go all the way to machine code!

Pythons SciPy and R are good choices here. They have the optimized libraries inside and easily accessible, but at the same time allow to do the wrapper stuff in a simpler language.

Have a look at this programming language benchmark:

http://benchmarksgame.alioth.debian.org/u32/which-programs-are-fastest.html

Pure JavaScript has a high variance, indicating that it can do some things fast (mostly regular expressions!) others much slower. It can clearly beat PHP, but it will be just as clearly be beaten by C and Java.

Multithreading is also important for modern data mining. Few large systems today have a single core, and you do want to make use of all cores. So you need libraries and a programming language that has a powerful set of multithreading operations. This is actually why Fortran and C are losing popularity here. Other languages such as Java are much better here.

Eyewitness answered 26/7, 2012 at 5:54 Comment(2)
Excellent post. I note Javascript has variance which is comparable or less than PhP, Python, and Ruby. Platform specific libraries are a liability for a portable app, which is what I'm looking at. When I chart Fortran, C++, Java 7 server, Lua, Ruby, PhP, and Python, Javascript kind of bridges the performance of the slow and stinkin' fast. An appealing niche for something so portable. You're right though, any big data number crunching is best done on a tailored hardware/software combination. But for portability, javascript stills seems to rule the roost.Mccluskey
Don't take the benchmark for literal though. If you look closely, there are situations where the Java code essentially consists of calling a C library via JNI (no wonder it doesn't beat C). And in fact, many of the C programms will in turn call Fortran subroutines.Eyewitness
A
4

Although this discussion is a bit old and I am not a Javascript guru by any stretch of the imagination, I find the above arguments doubtful about not having the processing speed or the capabilities for advance math operations. WebGL is a Javascipt API for rendering advance 2D and 3D graphics which relies heavily on advance math operations. I believe the capabilities are there from a technical point of view however what is lacking is good libraries to handling statistical analysis, natural language processing and other predictive analytics included in data mining.

WebGL is based on openGL, which in turn uses libraries like BLAS (library info here).

Advances like node.js, w8 make it technically possible. What is lacking is libraries like we can find in R and Scilab to do the same operations.

Anachronistic answered 18/12, 2013 at 13:56 Comment(1)
You may find math.js an interesting initiative in this regard.Motto

© 2022 - 2024 — McMap. All rights reserved.