Archive for October, 2007|Monthly archive page

How Much I Rely on Class Libraries

Ok, so I was stamping out a recent (quite simple) assignment for a class. The assignment required the separation of each place value in an integer. Since the information came into the program as a string, I stamped out a simple solution using some easy loops and handy methods available in the String class (of the .Net 2.0 class libraries). However, a quick skim over the rubric included 5% for correctly using div and mod to parse out the integer values. I immediately flipped back into my program, then froze for a second..

then another…

then 2 more…

I was completely blanking. I know its a simple task, its just I have become so accustomed to the incredible abundance and availability of a dozen methods for every little task that I blanked for a good minute, just stonewalled. I knew I had written this code before, everyone has done it at least once in some ‘Intro to Programming Theory’ class, and the concept was easy enough, but it just wouldn’t come. Cursing the professor for such a trivial demand, I went and got a cup of coffee.

Upon returning I realized how stupid and petty I was being. While I do often rely on cool class libraries and the methods they provide, I really just have to stop being so self-righteous and realize how likely it was that much of the class would be completely stuck on such a task. We spend so much time today learning about existing technologies and API’s that we forget the core of programming: Problem Solving.

I quickly slapped together the following C# method:

static int[] toIntArray(string input)
{
int i = 0;
int digit;
List<int> ints = new List<int>();
for (int numdigits = input.Length; numdigits > 0; numdigits–)
{

digit = Convert.ToInt32(input) / (int)Math.Pow(10.0, (double)(numdigits – 1));

digit = digit % 10;

ints.Add(digit);

i++;

}
return ints.ToArray();
}

In retrospect, its quite simple, I just hope that this was my ‘moment of realization’ and I don’t get so inundated again as to the point where I can’t do the simple stuff on my own anymore.

SqlLite Linq Provider

Ok, so as many of you may have noticed in my last post I’ve taken a real interest in C# 3.0 and its new nifty features. Now, I’m mostly just excited for the simple collection manipulations, but the whole Linq to SQL thing is nagging in the back of my mind. Now while complete CRUD will no doubt take some serious code, simple query support is not that difficult to implement. After following mattwar’s blog series on a generic DB provider for Linq, I decided that I wanted that awesome glory against a lean and mean sqlite db.

So, about an hour or so of messing and meddling I have a few samples working. (The attached zip) Its a Visual Studio 2008 Beta 2 project (I’ll be writing autotools magic for mono later this week) so sorry for that, the code can still be imported into Monodevelop, just the solution/project files won’t work. Anyways, I have tested a smattering of JOIN’s and all sorts of simple selects without issue. However, the elements of sqlite that behave differently tend to do so silently and without complaint, making it harder to be certain that everything is working. However, I’m planning on fleshing out a set of test queries, however, for now I could really just use the help with testing/checking the SQL (as I’m no sqlite Guru).

Known Issues

  • DataType sloppyness – I hope to handle this better (storing a DateTime string in TEXT would extract to DateTime successfully) right now you need to pretty much just use strings or numeric values.
  • Inefficient Queries – Not being a Sql master, I can’t say that much of whats generated is the best way to do things, please, if you know then share!
  • OrderBy issues – Its just hard as heck to get working, it seems to work fine sometimes, but no promises.

Anyways, play around, have fun, and note that you need the Sqlite provider for ADO.Net (duh).

Linq To Sqlite Download

I’m looking at db_linq, which is a full (bi-directional, change tracking, general awesome crazyness) solution, this is really just a way to query sqlite db. I might try to add a sqlite provider to db_linq at some point, its just that their system is very different from my implementation, so there wouldn’t be too much shared code. :(

The Changing Face of High-Level Programming

Ok, so I’m sure most MS .Net dev’s have already seen these posts far too many times, for the Mono users out there, I have a little treat. While Moonlight and WPF get tons of hype, I think the biggest and most exciting change coming soon to a C# compiler near you is support for lambda expressions, anonymous types, and extension methods.

Now on the whole this doesn’t sound all that exciting, I mean, before a few months ago, I had never really used lambda expressions to accomplish much beyond pass that unit in an intro to CS class. Individually, there’s nothing to jump for joy about, but when used in conjunction, we can produce startlingly clean and readable code.

To demonstrate this I’ve whipped up two examples that I was fiddling with as I read a million tutorials. They aren’t fancy XML or Database providers, just some simple (and quite common in my experience) text parsing tasks that have disproportionately complex code. We will use some of the new C# 3.0 features to make far cleaner and more readable code.

The first example is an exclusion string, or a set of characters that are not allowed in another.

var illegalchars = "abcdefg"; 
string testString1 = "Kevin"; 
string testString2 = "hijkmlppp";

The ‘old’ way of checking both strings for one of the illegal chars:

 foreach (char c in illegalchars) { 
if (testString1.Contains(c) || testString2.Contains(c)) 
 Console.WriteLine("illegal char!"); 
}

Using awesome new stuff:

 if (testString1.Intersect(illegalchars).Any() 
|| testString2.Intersect(illegalchars).Any()) 
Console.WriteLine("Linq found it too");

Our next example is ‘exploding’ or splitting a series of values out of a string (CSV and PSV are common examples of this) into an array:

 string pipeDelined = "Kevin | McCool | Kubasik";

An old solution might have been (I know we could optimize this, or clean it up, just making a point ;) ):

 List<string> names = new List<string>(); 
foreach (string s in pipeDelined.Split('|')) { 
var ts = s.Trim(); 
if (ts == "") continue; 
names.Add(ts); 
} 
var allNames = names.ToArray();

Using our cool new C# 3.0 tools, we can change this to the super-sexy:

 var allLinqNames = pipeDelined.Split('|') 
.Select(s => s.Trim())
.Where(s => s != "")
.ToArray();

While a hardened child of OOP (via C# and Java) might baulk at the new syntax, I think that it can quickly start to grow on a developer. Moreover, it has the distinct advantage of being unambiguous, and makes reading someone else’s dense code much more fluid. 

I really can’t wait for C# 3.0, and not for those flashy API’s, just the simple syntactical sugar that is already making me lazier by the minute.

Waking Up In the Middle of the Night

It happens.. even running on so little sleep, I still find myself waking.

Fortunately, this time I awoke with an awesome realization. I’ve been pounding my brain against the wall for a week now on how to further refine/increase the accuracy of my original relation-based ranking system. My initial results had been less than stellar when unleashed upon the desktop as a whole. In controlled situations (where my defined relationships weight’s were proportionate and scaled) the results were excellent, but I was hoping this ‘lowest common denominator’ of sorts would be the answer. I was mistaken. After being more or less tossed back to square one, I was less than optimistic to say the least.

However, at 2:30 this morning all that seems irrelevant, as I believe I have determined the key to blazingly accurate desktop search results (specifically over large search sets, to the order of shared drives with thousands of documents, images, e-mails and other media files without any real semantic system to start). In my original design I made the mistake of utilizing fixed-proportion weights for my relationships. A similar mistake as seen in many ObjectRank based systems. PDF Alert! By fixed proportion, I mean that an astronomical amount of time has gone into determining how important an ‘author’ relationship is when compared to a ‘creation date’ relationship. I (like many before me) was using a weight x termsimilarityindex type system for each relationship. As a result I was spending tons of time and effort trying to strike the proper balance, and in most cases when I got one situation to work, I completely destroyed another.

I think my < sarcasm > brilliant </sarcasm > revelation is becoming obvious, but bear with me.

We cannot pretend that authorship means the same thing to all users, a simple example is the large number of users who still operate relatively isolated desktops, where they are the only author for most of the content. if someone email’s them a document, it will have a hard time weighing up. However, creation date/modification date would probably serve as a solid indicator of relationship, as one person can really only work on one thing at a time.

I wish I had something better to show than just this (I’m mostly writing this down so I don’t forget it in the morning :) ) but I’ve determined that we need a deeper dimension of weight on relationship weighting (when scoring). While one possibility is to just add another variable to our existing weight-determination system, I am leaning towards something more broad. What if the programmer only had to specify a relationship, and through a combination of its occurrence, how closely it paralleled term-based similarity, and how often that relationship type was used to rank a selected result (would require gui integration, but for this proof of concept thats ok in my head) to build an individualized weight for each relationship.

All of a sudden, the massive programmer burden of a relational ranking system is removed! (it takes a lot of specific code to handle each relationship and its weights/different characteristics properly) While there would be a massive front-end cost to tweaking and tuning the system which determines those individual relationship weights, it would be time well spent, as new data types/sources are added, there is no additional work beyond declaring/mapping the relevant relationships.

Once the sun has actually risen, I’ll try to start the process of actually codifying what I’m trying to say. If I’ve actually made enough sense that anyone understands what I’m getting at and has any thoughts/comments/criticisms, please share!

Banshee Ipod Playlist Support

It looks like the monster might finally start to lay itself to rest. After almost 2 years, one of the most basic feature requests for Banshee looks like it will finally be fulfilled. I’m talking about playlist syncing to iPods. While there have been a plethora of patches in varying states of readiness always floating around, it just never got into trunk. I am very pleased to have checked in a working (and building at the moment) patch which enables the management of iPod playlists though banshee.

I know that the patch has been in better shape, there were a dozen different times that a commit might have made sense, but in the end, ipod-sharp is a moving target, and trying to hit it and Banshee with stable API’s at the same time (without a freeze ;) ) has proven to be quite difficult (no hard feelings to the Banshee dev’s they keep new features coming, and fast). Anyways, there are a few known bugs with this patch, most of which (in my super-limited testing) stem from ipod-sharp being in the middle of an API shift, and trunk isn’t working.

Anyways, I wanted to make a list of Features and Bugs, namely so the 2 don’t get confused, since a big part of this patch was trying to determine exactly what ‘expected behavior’ was, theres a lot of room to grow.

Known Bugs

  • Major Performance Issues – This just needed to eventually go in, and maybe the new ipod-sharp api will have a better solution, but I started working on this, everything (meaning the entire music library) must be iterated over to find a corresponding track. Some preliminary work was done to get more content sorted/hashed, but theres still a lot of work to do here.
  • Double Tracks on IPod – Depending on your version of ipod-sharp, and what random steps you take to get things building against your version, there is a common issue where a Playlist Dragged from the Library onto an iPod will result in duplicates of every song in the playlist on the iPod. This should be easy enough to track down if someone just has the time and patience.
  • New ipod-sharp API – As there will eventually be a new ipod-sharp API, someone needs to migrate the current logic to the new API, should be mostly the same except for the device detection logic.

Behavior Issues/Features

  • A Playlist from the Library to the iPod with the same name will result in the iPod version being overwritten.
  • Dragging a track from the library to a iPod playlist will result in that track being copied to the iPod again
  • Click and Drag support for playlist’s on iPod, its recommended that you drag songs from the iPod’s library
  • Rename of iPod playlists
  • Does not synchronize all library playlists to iPod automatically, only those which are placed onto the iPod

I think thats most of it, once iPod support in Banshee has leveled out a little bit, I plan on adding support for On-The-Go playlists and Smart Playlists. Anyways, I know that its far from a perfect commit, but after porting this patch through so many API changes, design shifts, and general bitrot, I really just wanted to get it out of Bugzilla.

The obligatory screenshot:

Banshee With iPod Playlists

Note: I’ve tested this with the latest iPod Firmware, if you run the Hash tool as you normally would, it should work fine.

Building More Relationships in Beagle

Today I checked in a few fun changes to Beagle today focused on the idea of emphasizing relationships between entities. It doesn’t sound like a whole lot of fun, but its kinda nifty.

New Query Context Options

  1. Find Documents by same author.
  2. Find E-mails from same contact.
  3. Find Pages from same site.

In addition (building upon Beagle’s new External Metadata system) I have added support for the tracking of Firefox downloads to files. The file downloaded with Firefox has an extra property (beagle:Origin) which denotes the Url it was downloaded from. I haven’t started to integrate anything on the UI side with this new information, as I want to add support for Epiphany, Opera, and Konqeror. Eventually, I would love to see this kind of mapping from downloaded mail attachments, but thats a little more difficult.

Anyways, this is more work towards my eventual goal of a ranking system based upon relationships (among desktop data). Anyways, I know that no feature-centric blog post is complete without screenshots, so I present:

Original Query

The Resulting Query

Beagle’s powerful and simple query language makes stuff like this really easy, its just a matter of knowing what properties warrant special treatment like this. I’m open to ideas, what