Load a DOM and Execute javascript, server side, with .Net
Asked Answered
D

1

9

I would like to load a DOM using a document (in string form) or a URL, and then Execute javascript functions (including jquery selectors) against it. This would be totally server side, in process, no client/browser.

Basically I need to load the dom and then use jquery selectors and text() & type val() functions to extract strings from it. I don't really need to manipulate the dom.

I have looked at .Net javascript engines such as Jurassic and Jint, but neither support loading a DOM, and so therefore can't do what I need.

I would be willing to consider non .Net solutions (node.js, ruby, etc) if they exist, but would really prefer .Net.

edit The below is a good answer, but currently I'm trying a different route, I'm attempting to port envjs to jurassic. If I can get that working I think it will do what I want, stay tuned....

Dramshop answered 4/6, 2012 at 18:20 Comment(1)
How is it coming? I would love to benefit from - or contribute to - such a project, since I made my own attempt but have stalled for the time being. If you want, just add @gmail.com to my SO name and you can contact me there. I have a JavaScript project that adds ActiveX to Jurassic here: jurascript.codeplex.comErigeron
L
15

The answer depends on what you are trying to do. If your goal is basically a complete web browser simulation, or a "headless browser," there are a number of solutions, but none of them (that I know of) exist cleanly in .NET. To mimic a browser, you need a javascript engine and a DOM. You've identified a few engines; I've found Jurassic to be both the most robust and fastest. The google chrome V8 engine is also very popular; the Neosis Javascript.NET project provides a .NET wrapper for it. It's not quite pure .NET since you have a non-.NET dependency, but it integrates cleanly and is not much trouble to use.

But as you've noted, you still need a DOM. In pure C# there is XBrowser, but it looks a bit stale. There are javascript-based representations of the entire browser DOM like jsdom, too. You could probably run jsdom in Jurassic, giving you a DOM simulation without a browser, all in C# (though likely very slowly!) It would definitely run just fine in V8. If you get outside the .NET realm, there are other better-supported solutions. This question discusses HtmlUnit. Then there's Selenium for automating actual web browsers.

Also, bear in mind that a lot of the work done around the these tools is for testing. While that doesn't mean you couldn't use them for something else, they may not perform or integrate well for any kind of stable use in inline production code. If you are trying to basically do real-time HTML manipulation, then a solution mixing a lot of technologies not that aren't widely used except for testing might be a poor choice.

If your need is actually HTML manipulation, and it doesn't really need to use Javascript but you are thinking more about the wealth of such tools available in JS, then I would look at C# tools designed for this purpose. For example HTML Agility Pack, or my own project CsQuery, which is a C# jQuery port.

If you are basically trying to take some code that was written for the client, but run it on a server -- e.g. for sophisticated/accelerated web scraping -- I'd search around using those terms. For example this question discusses this, with answers including PhantomJS, a headless webkit browser stack, as well as some of the testing tools I have already mentioned. For web scraping, I would imagine you can live without it all being in .NET, and that may be the only reasonable answer anyway.

Lindane answered 4/6, 2012 at 19:0 Comment(7)
Could CsQuery act as a DOM for jurassic (with a little wrapper layer)?Torras
CsQuery's DOM implementation is very different from the browser one because C# is a strongly typed language and javascript isn't. It would be a lot easier to just use jsdom which is already written in javascript for this purpose -- in theory it should run as-is in jurassic though I don't know if anyone's tried it before. (I actually started borrowing unit tests from jsdom for CsQuery).Lindane
I've been lookin into this - there are a couple of things that you need that aren't in Jurassic, but there is another project called jurascript that has the needed bits and bobs. jurassic.codeplex.com/discussions/360450Torras
I really hope someone has the time and energy to see this through. I love Jurassic and would use it in all sorts projects if it just had the rough edges removed.. it should be at the heart of a .NET headless browser! But i haven't quite been able to get it working in my situations, and the lack of ability to save & load the compiled DLLs is frustrating and makes it too slow for a lot of uses. (I know someone else has been working on that but I never was able to quite make it work in my situation). I just have too many other projects going on to work on this one and V8+wrapper works (if ugly).Lindane
we should set up a kickstarter :)Torras
hah, yeah. I am amazed nobody's grabbed onto it, the author did so much amazing work getting it this far, and javascript is so important these days, seamless .net integration would open many doors for cool projects. Seems like there's not a lot that would be needed to push it over the edge, it just needs an a good owner/cheerleader..Lindane
The mentoined 'XBrowser' is obsolete, the GitHub page refers to SimpleBrowser. Unfortunately, this 'browser automation engine' does not support javascript as the OP wants.Nickname

© 2022 - 2024 — McMap. All rights reserved.