This is a pretty straight forward tutorial on how to build a Node.js web crawler, with a realtime user web interface.
At one of my previous jobs we had project with a big amount of web pages. The website was cached to memcache. After a while we realised that we need to warm up the cache for a couple of depth levels of pages. I end up creating a small node app, which was visiting each web page. Thus we could generate cache for a bunch of web pages when needed.
When we had the daemon up and running, I thought it would be great to have a web interface, with some fancy realtime updates on it. Ultimately I've built a UI on the top of my crawling daemon.
You can see the sources, contribute or use the application for your needs. It's hosted on github.
This crawler is NOT a node.js module. This is a standalone web application. Technologies used: Node.js, Express, socket.io, MongoDB, Twitter Bootstrap, Highcharts.
My name is Eugene. I am a passioned web developer, in love with Node.js, with over 6yrs of experience.
My main domain of expertise is full cycle application development. I love building robust scalable Node.js applications.
I create all kind of applications - websites, API servers, realtime applications.