Archive for the ‘Beagle’ Tag

Tomboy Hackfest Tonight at the Novell OSTC

Well be hacking it up tonight at 6:00PM MST at the Novell Open Source Technology Center. The rough TODO for the night seems to be Tags, Tasks and maybe even a backend to query Beagle. ;)   Anyways, if your in the greater Salt Lake City area, come on down! If your a little further away but want to join in anyways,  join in on #tomboy!

See you tonight!

Mono 1.2.6 Memory Usage

So, I’ve heard a lot of hype about the upcoming 1.2.6 release of Mono being faster, leaner, and more stable then ever before (due largely to Novell’s acquisition of a QA team dedicated to Mono). Beagle has always gotten flack over memory use, and as a result, we are relentless in our hunt for abused memory. And while it is wonderfully satisfying to reduce memory usage, its really hard to beat dropping megabytes of resident memory for free :) . I’m running Ubuntu Gutsy and its 1.2.4 release of Mono, but in my quest for some real numbers to back up all this talk I built the current SVN trunk of Mono.

Even my most optimistic expectations put our potential benefit around maybe 2 or 3 MB resident less than beagle running under Mono 1.2.4. On my test setup, Beagle 0.3pre consumed (after my recent Opera backend fix) around 110 MB of VM and 36 MB of RSS (averaged over a 2 hour run).After building and installing Mono 1.2.6, the same 2 hour run was averaging 72 MB of VM and 27 MB of RSS! Its still far from perfect, but free memory reduction is just plain cool :) .

Some observations about the general pattern of allocation and collection under 1.2.6, it ‘idles’ much lower than 1.2.4. While some actions always push the memory usage up, 1.2.6 *appeared* to return to its lower memory point much faster, and more regularly.

Anyways, I just wanted to say, props to everyone on the Mono team for rocking my socks.

Google Docs Presentations: A Major Disappointment

Google Docs has revolutionized the office suite, namely the word processor. Collaboration is easy, smooth, integrated, and automatic, whats more all your documents are accessible from anywhere, and all the common features I need are present. While I don’t really use spreadsheets very often, my few simple instances of using Google Doc’s for spreadsheets were easy enough. Needless to say, when I heard that a Presentation component was to be added I was excited.

Now, I’m a far cry from a Powerpoint Guru, I’ve used it maybe 2 times, but with an upcoming presentation and the Ubuntu Utah user group, I figured I should probably slap a few slides together. Since I want to have a sane contingency for the exploding laptop, forgotten laptop, or limited presentation machine, I figured this would be a great chance to stretch my presenting legs and give it a try. Too bad I’m not that lucky, I was unable to save _any_ changes, just create a new presentation, modify it as much as I wanted, but any close would lose all my work… lovely (this is in Firefox 2.0, 3.0 trunk, IE7, and Opera 8.24). I’m willing to give Google a few hours and then try again (its possible its some small downtime, its still beta after all ;) ).

Another gripe (as I read the Help to try and find similar reports of such bugs) is that there is a very limited selection of templates, while I might be able to upload new templates authored in PowerPoint or OpenOffice, couldn’t they at least let me change the color schemes? (If I can and I’m just missing how to do it, please share!)

I’ll post an update soon, and let you all know if I had any luck saving anything….

Update: This is fixed, and now its kinda cool! I’m going to try it this Saturday at the Ubuntu Utah users group and see how it goes.

Google: How do you do it?

So its not a big surprise that an oft-requested feature for Beagle is the ability to index a users Gmail messages (like Google Desktop Search). Today we (the Beagle developers) started to investigate just how this is done. While POP3 (and now IMAP) are available, downloading all of a users mail, indexing it, and then caching the text so we can display it. Now, my initial investigation into GDS for Linux revealed that it was calling home via POP3s and downloading lots of data. I have assumed that it was simply iterating over all messages (via POP3), downloading them, indexing them, and caching the compressed content somewhere in Google’s custom indexes.

Now, I had originally planned on this post being an open plea to any and everyone at Google asking them to open up the Gmail access API, but seeing as its just the plain old ugly POP3 (maybe a cool extension), were stuck biting the bullet and implementing a remote mail access layer.

Anyways, given how incredible Google has been in a million other situations, I thought I would throw out 2 wildly out-of-this-world questions, I wouldn’t expect to get a response, but before I spend the time figuring it all out, I felt like I should at least ask.

  • Are there some special POP Extensions available in Gmail? Is there some helper web api? Or does GDS really just have a POP3 crawler?
  • Is your compression/text storage library open source? (or documented in some research paper at all?) Beagle has always struggled with how to best handle storing copies of a documents text so that it might be made available in interfaces. While we do have a new hybrid text cache (text over 4k on the filesystem, under in a sqlite db, all compressed) we were still no where near as small as the GDS indexes. A cursory examination reveals that the GDS indexes are some form of b-tree on disk, but how are you compressing all that text so small? Is there some substitution/reconstruction algorithm? (It seems like that would be wildly expensive, but who knows).

Anyways, its a long shot, and its pretty far out there, but for the sake of not passing up answers that I can’t seem to find elsewhere on the net, I have asked.

Waking Up In the Middle of the Night

It happens.. even running on so little sleep, I still find myself waking.

Fortunately, this time I awoke with an awesome realization. I’ve been pounding my brain against the wall for a week now on how to further refine/increase the accuracy of my original relation-based ranking system. My initial results had been less than stellar when unleashed upon the desktop as a whole. In controlled situations (where my defined relationships weight’s were proportionate and scaled) the results were excellent, but I was hoping this ‘lowest common denominator’ of sorts would be the answer. I was mistaken. After being more or less tossed back to square one, I was less than optimistic to say the least.

However, at 2:30 this morning all that seems irrelevant, as I believe I have determined the key to blazingly accurate desktop search results (specifically over large search sets, to the order of shared drives with thousands of documents, images, e-mails and other media files without any real semantic system to start). In my original design I made the mistake of utilizing fixed-proportion weights for my relationships. A similar mistake as seen in many ObjectRank based systems. PDF Alert! By fixed proportion, I mean that an astronomical amount of time has gone into determining how important an ‘author’ relationship is when compared to a ‘creation date’ relationship. I (like many before me) was using a weight x termsimilarityindex type system for each relationship. As a result I was spending tons of time and effort trying to strike the proper balance, and in most cases when I got one situation to work, I completely destroyed another.

I think my < sarcasm > brilliant </sarcasm > revelation is becoming obvious, but bear with me.

We cannot pretend that authorship means the same thing to all users, a simple example is the large number of users who still operate relatively isolated desktops, where they are the only author for most of the content. if someone email’s them a document, it will have a hard time weighing up. However, creation date/modification date would probably serve as a solid indicator of relationship, as one person can really only work on one thing at a time.

I wish I had something better to show than just this (I’m mostly writing this down so I don’t forget it in the morning :) ) but I’ve determined that we need a deeper dimension of weight on relationship weighting (when scoring). While one possibility is to just add another variable to our existing weight-determination system, I am leaning towards something more broad. What if the programmer only had to specify a relationship, and through a combination of its occurrence, how closely it paralleled term-based similarity, and how often that relationship type was used to rank a selected result (would require gui integration, but for this proof of concept thats ok in my head) to build an individualized weight for each relationship.

All of a sudden, the massive programmer burden of a relational ranking system is removed! (it takes a lot of specific code to handle each relationship and its weights/different characteristics properly) While there would be a massive front-end cost to tweaking and tuning the system which determines those individual relationship weights, it would be time well spent, as new data types/sources are added, there is no additional work beyond declaring/mapping the relevant relationships.

Once the sun has actually risen, I’ll try to start the process of actually codifying what I’m trying to say. If I’ve actually made enough sense that anyone understands what I’m getting at and has any thoughts/comments/criticisms, please share!

Building More Relationships in Beagle

Today I checked in a few fun changes to Beagle today focused on the idea of emphasizing relationships between entities. It doesn’t sound like a whole lot of fun, but its kinda nifty.

New Query Context Options

  1. Find Documents by same author.
  2. Find E-mails from same contact.
  3. Find Pages from same site.

In addition (building upon Beagle’s new External Metadata system) I have added support for the tracking of Firefox downloads to files. The file downloaded with Firefox has an extra property (beagle:Origin) which denotes the Url it was downloaded from. I haven’t started to integrate anything on the UI side with this new information, as I want to add support for Epiphany, Opera, and Konqeror. Eventually, I would love to see this kind of mapping from downloaded mail attachments, but thats a little more difficult.

Anyways, this is more work towards my eventual goal of a ranking system based upon relationships (among desktop data). Anyways, I know that no feature-centric blog post is complete without screenshots, so I present:

Original Query

The Resulting Query

Beagle’s powerful and simple query language makes stuff like this really easy, its just a matter of knowing what properties warrant special treatment like this. I’m open to ideas, what

Relationships in the Desktop – Relational Desktop Search and Beagle

I’ve been working on and off on a writeup concerning the use of Beagle to build an intelligent ‘rank’ for desktop entities. Or, in short, a Ranking system (not unlike Page Rank or the like) to organize desktop search results by far more than just keyword/date. I know the writing sucks, and its not 100% complete yet. In addition, I don’t have much in terms of code to share (yet).

To summarize (for those lazybones out there) I’m thinking of utilizing fairly universal and constant relationships (Creator, Creation Date, Modification Date(s), Parent/Source, and maybe others) to recurse deep into desktop relationships. By adding relevancy to the root hit for every child it has (logarithmically decreased by recurse iteration) we can have far more accurate desktop search results when querying a simple keyword/phrase. In addition, the children of a hit could often be considered hits themselves, if found in enough ‘root’ hits.

Its a loose and patchy idea, and miles from a realistic implementation, however, thanks to the awesomeness of Lucene, comparing 2 in-index documents for textual relevancy (based on Term Frequency) is not impossible. (I have not considered the performance elements of these comparisons yet, they may be too slow to be realistic without serious optimization)

Anyways, I’m working on it in Google Docs, so you can check out the full document here. I’ll post once I’ve finished my research/planning etc.

Please, share your thoughts! This is in the ‘major brainfart’ stage, so its open to whatever from anyone, I want to hear ideas!

Technorati Tags: , , , , ,

Powered by ScribeFire.

Updated Beagle Packages for Gutsy Available

Beagle support in Ubuntu has been less than stellar up until this point (across all releases), and unfortunately, the best that we can really hope for in the immediate future is acceptable. This is mostly because only a few of Beagle’s developers are running Ubuntu, and accurately reproducing common errors is difficult. To top this all off, the defacto Ubuntu contact at this point is me, and I haven’t had the available time to really track down some of the more difficult bugs.

However, this problem reached an all time low when the beagle source package stopped building in Gutsy. This spurred us into action (our urgency increasing as we realized how close Gutsy was to shipping) and as a result there exist updated Ubuntu Gutsy packages (based upon the new 0.2.18 bugfix release of Beagle) available for testing. Thanks to Launchpads new super-awesome Personal Package Archive system, you only need to add the following sources, or download from the corresponding link. (NOTE! the versioning of these debs will not force an update if they are accepted into main, you will need to reinstall should they be accepted at their current version number!)

deb     http://ppa.launchpad.net/kkubasik/ubuntu gutsy main 
deb-src http://ppa.launchpad.net/kkubasik/ubuntu gutsy main 

Please report bugs with these packages either to Beagle in launchpad or the dashboard-hackers mailing list. The more feedback we get in the next few days the better the chance that Ubuntu Gutsy will ship a solid Beagle.  

 

Beagle Ubuntu Package Update

Technorati Tags: , , , , , , , ,

With everything that has been swarming all over my plate lately, I haven’t had a chance to really keep on top of the Beagle packages in Ubuntu, and as a result, they are currently pretty crappy. I have a branch (meant to be feisty-updates, but I was in a hurry, and didn’t feel like branching), with a building deb configuration for Gutsy. I hope to have binaries/sources available for testing later this week.

The branch is hosted here:

https://code.launchpad.net/~kkubasik/beagle/feisty-update

Just do the following to try and build:

 

   1:  bzr branch http://bazaar.launchpad.net/~kkubasik/beagle/feisty-update
   2:  cd feisty-update
   3:  sudo apt-get build-dep beagle
   4:  bzr builddeb -w --split

Beagle Search Support in GtkFileChooser!

Finally! Perhaps one of the last major features that Beagle needs to match Spotlight and Windows Live Search. Integration into every file choice made in the Gnome environment has long been a dream of mine, and a little less than a year ago, this bug was filed with hopes of making that dream a reality. After lots of hard work, a rough cut of the patch has been committed to the gtk+ trunk, with plans to add missing features in time.

While it will still be some time before this code reaches a Gtk+ release (development releases for the next major series start in late April) its great to feel some awesome love from our friends over at Gtk, a special thanks to Federico Mena Quintero for really being the driving force that finally got this done. What great is that plans for using Beagle’s index to help GtkFileChooser already seem underway.
For any budding Gtk+/Gnome/Beagle hackers out their, this feature still needs a lot of love. A list of bugs/todos exists here, and if someone were to get the ball rolling on anything on that list, it would be a huge help!

And last, the obligatory (if unexciting) Screenshots:

Here were searching for a Mail Attachment in Evolution.

FileChooser and Evolution Screenshot

And here we search for a file to upload in Epiphany:

FileChooser and Epiphany Screenshot

A Call for Help

Ok, since my recent post on Dashboard there has been scattered interest in helping revive/work on Dashboard, so I wanted to post a list of things Dashboard needs before we can really look at making it usable at all.

  • libdashboard
  • Or as its better known, a C library that generates valid, parseable clues. This is not only critical since most plugin authors aren’t going to want to spend the time validating that mono can deserialize all their XML. But because we can generate bindings for most every other language once we have them in C.
  • Real Bug/Performance Testing/Fixing
    • This is a huge one, Dashboard is still pre-alpha, and mostly proof of concept. While the code base could be brought up to production level, it needs tons of cleanup. This is where an army of open source dev’s can help the most, compile and install dashboard from SVN, then just play with it, and every time it crashes, track down why, and try to make it sane. This is long, slow, tedious, and thankless work, but it is an absolute necessity if people want to start using dashboard, as even simple race cases will crash dashboard most of the time.
  • Mappings/Rules
    • You have to be a little more familiar with the Beagle/Dashboard code base to help out with this, but we need them for a huge spread of plugins, and almost everything beagle can generate.

    Just a note, I posted this list on the Dashboard Wiki.

    Technorati Tags: , , , , ,

    Powered by ScribeFire.

    More Cool Dashboard Stuff

    Alright, before I get to the cool stuff, first things first, with current metadata system thriving,  and at Beagle’s current speed, if someone really wants to start to stabilize up the Dashboard API a little more, and start to make this something less abstract, IM,Call, Mail (either one), or even just show up in Cleveland, and I’d  be more than willing to help get Dashboard moving again.

    Anyways, now that I’ve had another 2 minute excursion into the realm of getting Dashboard and its Banshee plugin building/working again, I have a quick 10 second screencast of dashboard being awesome.

    http://qub333.googlepages.com/dashboard-banshee-plugin.ogg

    The other cool technology that I really want to get integrated with Dashboard (or even Beagle) is the Open Natural Language Parser. In all honesty, there’s no  way to describe how awesome it is until you see it in action.

    http://qub333.googlepages.com/opennlp-sharp.ogg

    Sorry I’m too lazy to convert and all that jazz,  this is an impulse post that’s preventing work from getting done, so if someone wants to convert them, just let me know, I’ll host if you need.

    Big Board and Dashboard

    So there’s been a lot of talk about a cool concept program coming out of the blur of Mugshot. While the exact goals of the project do seems a little more Web 2.0, the basic UI layout and goals seems to be remote services for something like Dashboard. Dashboard was the precursor to Beagle, (initially designed as an indexing and storage backed for Dashboard) and is basically a meta-clue processing center which brings up information relevant to whatever your are doing. The initial screenshots are extraordinarily old, but last summer during SoC, a complete redesign and recode took place, which is what can currently be found in the Google Code repo. While Dashboard is missing a lot of polish, I would recommend taking a look at its clue processing system, as its really quire ingenious, and the dream of every Beagle developer to eventually have to time to make it a real working solution again.

    Dashboard

    Anyways, I know its not the exact same, but its probably worth at lest a few minutes of playing time. ;)

    Planet Beagle Random Icon Link Thinger

    Hey, So I felt like giving Planet Beagle some random-junk-on-the-side-of-my-blog love, so this little guy was created, make of him what you will, but know I’m no artist. However, a small problem, we have no SVG’s or any other source-type format of the new Beagle Logo, we still ship the old one in our tarball, if someone knows where one is, I’d like to do some general random art/junk creation. ;)

    Planet Beagle

    So yeah, I know its a little big, feel free to do whatever, but I completely didn’t think about the whole saving as a small raster, can’t edit much thing… so just know I’m sorry, and feel free to improve or recreate as you see fit.

    Code to use on your own page:

    <a href="http://planetbeagle.org" title="Planet Beagle" target="_blank"><img src="http://kubasik.net/photos/beagle-planet.png" title="Planet Beagle" alt="Planet Beagle" />
    </a>

    p.s. If anyone knows what these are actually called, I would totally benefit from knowing there real name.

    General Beagle Busyness

    Hey, so I’ve been pretty busy (again! *Shock* ) with school and haven’t had a ton of hacking time as of late. I’ve learned this means definatly pick jobs that don’t require a lot of follow up or that someone else is waiting on.

    Anyways, I had some beagle-fun last night and was somewhat productive.

    1. I started a Launchpad Team and Branch to maintain our Ubuntu beagle packages. I think the trick here is going to be paying closer attention to Debian, as both an upstream developer and a 2nd tier packager, I always think I’m going to be on top of the Debian packaging solution/patches, but rarely ever am. So anyways, I’m hoping to parallel/stalk what they do so we can avoid more nasty merges right after Upstream Version Freezes are instated.
    2. Documentation! A long while back I started to fill out the automatically generated Monodocs for Beagles public API. While they’re still nowhere near done (API Docs are a royal pain to write, you feel so redundant), I have started to pick that up again, so should you want to write some bindings for BeagleClient, start working on Dashboard again, or even start another search interface, at some point and time, there should be some quasi-complete API coverage in addition to the Beagle Wiki.
      1. And just so everyone knows, its easy to help out! if you have monodoc installed, you have everything you need! There is another bzr branch just for the BeagleClient docs. If I made you a member of the “Beagle Packers” team, you can commit straight to the branch, just let me know you did and I’ll update it on my server. As for the actual editing, it couldn’t be easier:

    monodoc –edit ./path/to/docs/dir/ and your editing!

    p.s. More Documentation news, I updated my libbeagle docs that are online too.

    Technorati Tags: , , , , ,

    Next Page »