Posts Tagged ‘readburner’

RSSmeme Is 100% Democratic; Is Your Aggregator?

Tuesday, May 6th, 2008

There’s been a lot of talk about RSSmeme and ReadBurner being democratic; in stark contrast to sites like Techmeme.  RSSmeme does absolutely no modifications to what you are sharing (other than URL canonicalization to get rid of duplicates).  I will remove stories that have only 1 share if someone emails me complaining that it defames them; if the story has multiple shares I wouldn’t remove it (but that hasn’t happened yet).  I think I’ve removed 2 stories so far this way.  I also make sure to tell the person that if someone shares it again then it will show up again; I am not trying to control what makes it’s way onto RSSmeme.  I will remove someones feed from RSSmeme if they request it (since RSSmeme uses the FriendFeed API to find new feeds).

During my interview for Profy we discussed removing things like LOL cats and comics from RSSmeme; I am absolutely opposed to it.  Who’s to say what you start removing next?  This is a very slippery slope.  During the interview we found out that ReadBurner didn’t have a LOL cat story on it’s homepage that RSSmeme did.  I didn’t claim that they were actively filtering these stories out with some sort of blacklist.  I just assumed that since RSSmeme and ReadBurner operate on a different set of feeds that it had not bubbled up on their end.

Then this Twitter post showed up in FriendFeed:

How can ReadBurner be democratic if they have some sort of blacklisting going on?  Am I getting too riled up over a bunch of cats with captions?  What do you think?

Update

Drew says they don’t filter.  His Twitter post was in reference to some users complaining that LOL cats were ruining their ReadBurner experience (see here).  I’m happy again :)

URL Canonicalization: Stop the Dupes!

Tuesday, April 15th, 2008

Aggregation is the name of the game these days and a big problem for sites like RSSmeme and ReadBurner is dealing with duplicates.  How do you know for sure that you have all the shares for a given URL?  What about services like FeedBurner or TinyURL which use redirects to get you where you want to go?  Enter URL canonicalization.

Canonicalizing something means to find the “standard state” of it.  So when you canonicalize a URL you want to find the URL that you finally end up on.  If you don’t canonicalize URLs before aggregating you end up with duplicates; maybe some users shared through a blog’s old RSS feed while others are using FeedBurner.  When you have duplicates you can’t reliably get a count of how many times a story has been shared.  This skews your data and makes adding features like RSSmeme’s widget absolutely impossible.

How do you canonicalize?  Well the easiest way is to just do a HEAD request to any URL that looks fishy.  On RSSmeme if a URL starts with http://feeds. or http://rss. then I do a HEAD request to that URL, which will follow the redirect and find the canonical URL.  If you do this for every URL then you are going to have performance issues so just choose the usual suspects.

Sites like FriendFeed don’t need to do this.  But RSSmeme and ReadBurner live and die by the counter.  RSSmeme currently canonicalizes URLs; it doesn’t look like ReadBurner is right now but maybe this post will enlighten them and anyone considering entering this area.

Congratulations on the launch ReadBurner!

Which is Faster? RSSmeme or Readburner?

Thursday, February 7th, 2008

Compare the current top story at RSSmeme and ReadBurner: J-Walking with Reader.  At the moment RSSmeme has 20 shares while ReadBurner has only 12.  I can’t say how fast ReadBurner is but I can go through 50 feeds per minute with my current script.  The feed with the oldest update time as of writing this post at 6:48pm is 5:46pm; so it currently takes me an hour to go through all 600 feeds.

When I first launched RSSmeme I went twice as fast; parsing 100 feeds every 5 minutes; but now I’ve toned it down to 50 feeds every 5 minutes to keep the load down on the server.  So if it takes me 1 minute to parse 50 feeds and I only parse every 5 minutes then that means 20% of the time my server is grabbing new data; I think that’s a happy medium.

Pages (without GET parameters like the search or pagination) are cached for 5 minutes.  If you want fresher data I could decrease the cache and parse more feeds at a time; just let me know!

RSSmeme – Now With Search Feeds

Wednesday, February 6th, 2008

Ask and ye shall receive.  David Rothman asked for search and RSS feeds for searches for ReadBurner; so I added them to RSSmeme.  Searches are basic: I split up your terms, search for them case insensitively in story titles, and “or” the results all together sorted by post date.  Your browser should auto-detect the feed for that search which will be formated in a url like this:

http://www.rssmeme.com/feeds/search/yahoo/google/microsoft/

Tada!

RSSmeme – Look out ReadBurner!

Wednesday, February 6th, 2008

Look out ReadBurner; RSSmeme is coming for you!  I registered rssmeme.com yesterday and launched the site today.  It was a test for me to see how good/fast of a Django developer I am.  It isn’t 100% done yet; but most of the functionality is there.

Why another one of these sites?  Because it was fun to make and competition is good.  I have things that ReadBurner doesn’t and ReadBurner has things that I don’t.  I think that eventually both of us will be better because of this.

How did I get so much data in less than 24 hours?  I got all the feeds by screen scraping ReadBurner (sorry if this offends you).  My update script parses through 50 feeds per minute; I’m not sure how fast ReadBurner is.

It isn’t the fastest site I’ve ever made; so please be gentle!  The caching is not so great yet.

If you have any requests please let me know.  I’ve kept things very simple so it’s easy for me to add new features at any time. I’m not sold on the color pallet yet either.