Try it now! – Note: it’s still buggy.
I’ll write about changes in original ExtJS FeedReader as soon as I can. Stay tuned! However you can still grub plain (not minified) Js from server but note this is pre-alpha quality code. ALSO: this installation uses my home PC as mongodb host.
What shall We do?
In this part we will develop schema for our mongodb-based data store and a small console program that will download and update feeds for us.
Database schema
MongoDB specific thoughts
While building that chema we need to keep in mind that we use MongoDB. So we can properly utilize its features.
1) It’s schema free – that is we can add new keys to document whenever we want it.
2) Every document has its own unique id.
3) Every document store all the keys – that is length of the keys is what is important.
3) We can store collections within documents. But retrieving a document is kind of atomic operation – you can’t get just a part of the document.
4) Indexes make retrieving fast but under some conditions they slow down the writing. And their creation ALWAYS locks entire database.
Application specific thoughts
Here i just give a list of application specific thought that may affect database design
1) We need a list of all feeds
2) A feed consists of feed items. But feeds can share some feed items.
3) In general feeditem id is a string – a guid or an url. But we will have many lookup operations using this id (for searching items already added). So there is a need of something that will help reduce the length of a feeditem id.
4) We need to reflect the fact that every item will have it’s own set of extensions
5) It should be easy to build presentation layer with the schema
6) Not only set of keys can be changed. But the meaning of some of them. So we need some versioning.
7) Current version is a single-user application, yet without authentication.
The schema
In the light of previous two paragraphs the schema is the set of following collections:
1) feeds
2) posts
That’s all. And structure of documents per collection:
1) feeds = [{“_id”, “v”, “title”, “description”, “total_items”, “items_in_feed”, “update_date”, “link”}]
2) posts = [{“_id”, “v”, “title”, “description”, “id”, “link”, “extensions”:[], “rate”, “read”}]
A closer look at some special scenarios
Items identifiers
Here we’ll take closer look at some common operations and see how to slightly optimize them.
The first operation is searching feed by it’s url. This operation will take place when the user subscribes to new feed.
And the second case is searching an item by its id. Now I should note that in many feeds item id is not a fixed length GUID but just a permanent link to the original post.
So here we have to do with variable-length strings. Accordingly, we will have variable resources costs. We should make those strings at least of fixed lengths. For this purpose we will use MD5 hash. it produces 16-byte long numbers, which in hex encoding result to 32 characters string. But wait, did i say encoding? Does it mean that if I choose a different encoding we can make ids shorter? The answer is of course Yes. And the algorithm chosen by me is ASCII85. With this encoding every and each id will be just 20 characters long.
Multi-user consideration
Apparently, this simplest version is single user. And we store user’s data together with posts. In the next version of feed reader we will split this collection and introduce two other ones – users and posts<user_id> for per user posts data (or maybe not, we’ll see :-).
Code
Commands
As long as all work takes place in separate thread we can use some commands while FeedDownloader is running.
This version has only 2 commands- stats and help
private static Dictionary commands;
private static void AddCommands()
{
commands = new Dictionary();
commands.Add("stats", Stats);
commands.Add("help", Help);
}
#region Commands
private static void Stats()
{
Console.WriteLine("Statistics");
Console.WriteLine("Total updated feeds count: " + totalUpdatedFeedsCount);
Console.WriteLine("Total retrived items count: " + totalRetrievedItemsCount);
Console.WriteLine("Total new items count: " + totalNewItemsCount);
Console.WriteLine();
Console.WriteLine("Retrived items count for the current feed: " + lastRetrievedItemsCount);
Console.WriteLine("New items count in the current feed: " + lastNewItemsCount);
Console.WriteLine();
}
private static void Help()
{
Console.WriteLine("Available commands:");
foreach (var item in commands)
{
Console.WriteLine(item.Key);
}
Console.WriteLine();
}
Syndication related code
Creating mongo documents from feed item:
private static Document PrepareAtomEntry(AtomEntry atomEntry)
{
Document doc = new Document()
.Append("version", 0)
.Append("type", "atom")
.Append("title", atomEntry.Title)
.Append("authors", atomEntry.Authors)
.Append("date", atomEntry.PublishedOn)
.Append("link", atomEntry.BaseUri.ToString())
.Append("description", atomEntry.Summary);
if (atomEntry.Content != null)
{
doc.Append("content", atomEntry.Content.Content);
}
var categories = atomEntry.Categories;
if (categories.Count != 0)
{
Document[] cats = new Document[categories.Count];
for (int i = 0; i < categories.Count; i++)
{
cats[i] = new Document()
.Append("domain", categories[i].Scheme.ToString())
.Append("value", categories[i].Term);
}
}
EnrichDocumentWithExtensions(doc, atomEntry);
return doc;
}
private static Document PrepareRssItem(RssItem rssItem)
{
Document doc = new Document()
.Append("version", 0)
.Append("type", "rss")
.Append("title", rssItem.Title)
.Append("author", rssItem.Author)
.Append("date", DateTime.Now)
.Append("link", rssItem.Link.ToString())
.Append("description", rssItem.Description);
if (rssItem.Comments != null)
{
doc.Append("comments_uri", rssItem.Comments.ToString());
}
var enclosures = rssItem.Enclosures;
if (enclosures.Count != 0)
{
Document[] encs = new Document[enclosures.Count];
for (int i = 0; i < enclosures.Count; i++)
{
RssEnclosure enclosure = enclosures[i];
encs[i] = new Document()
.Append("url", enclosure.Url.ToString())
.Append("length", enclosure.Length)
.Append("type", enclosure.ContentType);
}
}
var categories = rssItem.Categories;
if (categories.Count != 0)
{
Document[] cats = new Document[categories.Count];
for (int i = 0; i < categories.Count; i++)
{
cats[i] = new Document()
.Append("domain", categories[i].Domain)
.Append("value", categories[i].Value);
}
}
EnrichDocumentWithExtensions(doc, rssItem);
return doc;
}
Extensions (now only content:encoded is supported as most used):
public static void EnrichDocumentWithExtensions(Document doc, IExtensibleSyndicationObject extObject)
{
SiteSummaryContentSyndicationExtension ss = extObject.FindExtension(SiteSummaryContentSyndicationExtension.MatchByType) as SiteSummaryContentSyndicationExtension;
if (ss != null)
{
doc.Add("content:encoded", ss.Context.Encoded);
}
}
And finally

FeedDownloader.zip (2.28 mb) [Downloads: 328]