Node.js Web Crawler

This is a pretty straight forward tutorial on how to build a Node.js web crawler, with a realtime user web interface.

At one of my previous jobs we had project with a big amount of web pages. The website was cached to memcache. After a while we realised that we need to warm up the cache for a couple of depth levels of pages. I end up creating a small node app, which was visiting each web page. Thus we could generate cache for a bunch of web pages when needed.


When we had the daemon up and running, I thought it would be great to have a web interface, with some fancy realtime updates on it. Ultimately I've built a UI on the top of my crawling daemon.

You can see the sources, contribute or use the application for your needs. It's hosted on github.

This crawler is NOT a node.js module. This is a standalone web application. Technologies used: Node.js, Express, socket.io, MongoDB, Twitter Bootstrap, Highcharts.

About

My name is Eugene. I am a passioned web developer, in love with Node.js, with over 6yrs of experience.

My main domain of expertise is full cycle application development. I love building robust scalable Node.js applications.

I create all kind of applications - websites, API servers, realtime applications.

Skills / Technologies
Node.js, MongoDB, Mongoose, Socket.io, Mocha, Grunt, Git, LAMP, C/C++, Nginx
Contact