Home

Christian
There is a newer version of this page. You can find it here.

Welcome to PgNetted!

Some days ago I wanted to have a complete database driven web scrapping tool. So I have started to develop OraNetted. I was happy with my solution until it turns out, that a fully featured web browser and Html DOM would be a very handy thing. So I decided to use HtmlUnit as the browsing base. That time sadly (today I am happy with my decision) I was not able to get HtmlUnit loaded into my Oracle Database. It turns out, that PostgreSQL uses a different pljava approach, by using the systems JVM. After a quick shot I got HtmlUnit running through a SQL interface.

To do what a scrapper should do, not only a browser is needed. Something should control the browser too :-) I thought the base for all the browsing, formfilling, downloading, .. stuff should be some kind of scriptable. This will simplify maintenance a lot!

PgBsh

PgBsh stands for Postges Beanshell, which will be the very base of PgNetted. With Beanshell I have all the capabilities I would need to control HtmlUnit by script.

More to Come

....


MongoDB Logo MongoDB