Technology Musings

May 10, 2012

Platforms / iOS - using storyboard relationships for custom view controller containment

JB

One thing that iOS forgot to do with the release of storyboards and iOS 5 is support for custom "relationship" segues.  In addition to normal segues, UINavigationController and UITabBarController have what are called "relationships", which allow you to define containment information for these controllers.  So, for the navigation controller, you define the root view controller, and for the tab bar controller, you define the list of view controllers.  This is all done in a neat interface in the storyboard editor (which is the replacement for interface builder).

Another cool feature in iOS is "containment", which is basically a way of letting you write view controllers which contain other view controllers.  This was not really allowed in iOS 4, but they've added some basic support for it.  Unfortunately, the way that you might want to use containment with storyboarding is through relationships, but iOS doesn't provide a way to do custom relationships in storyboards - or even their default ones on custom view controllers.

My own task was to create a tab bar controller with the tabs at the *top*.  I had a few ideas on how I might approach this problem.  Subclassing UITabBarController didn't seem like a good idea.  So I tried several others.

One possible way was duck typing.  I thought that perhaps I could pull a fast on on the storyboard editor.  What I would do is define my own class that mimicked UITabBarController without actually inheriting from it.  Then, I could create my relationships with the storyboard editor, and then substitute my own non-tab-bar controller at the last minute.  This failed miserably.  

Firts of all, the storyboard editor doesn't like you to substitute non-subclasses for the class that it wants.  If you try to set it to a non-subclass of UITabBarController, it won't give you an error, but it won't save the change, either.

I tried to work around this by making my class inherit from UITabBarController while I was editing the storyboard, and then change it back for the compile.  This worked momentarily, but in the end, it turns out that the "relationship" segues aren't set via getters and setters.  I imagine there is an internal API that is being used.

Therefore, I next tried overriding initWithCoder and reading the nib values myself.  Unfortunately, I was not successful at guessing the keys for decoding.

Then, it hit me.  What I need to do is to mix subclassing UITabBarController with not mixing UITabBarController.  So here's what I did:

class 1: MyTabBarController

This is just a shell class, but it inherits from UITabBarController.  

class 2: MyTabBarControllerImplementation

This is your real implementation class, and only needs to inherit from UIViewController.  Make it work like you want it to.

What you are going to do is designate your tab bar controllers to be of type MyTabBarController in the storyboard editor.  Then, when viewDidLoad is called on MyTabBarController, the view controller de-registers the view controllers from itself, creates an instance of MyTabBarControllerImplementation, moves the view controllers over there, and then replaces itself with MyTabBarControllerImplementation.

Anyway, here's the code.  It's really rough, but works at the moment.  I'm open to improvements:

MyTabBarController.h:

#import <UIKit/UIKit.h>
@interface MyTabBarController : UITabBarController
@end

MyTabBarController.m:

#import "MyTabBarController.h"
#import "MyTabBarControllerImplementation.h"
@implementation JBTabController
- (void)viewDidLoad
{
    [super viewDidLoad];
    JBTestContainerViewController *vc = [[JBTestContainerViewController alloc] init];
    NSArray *vclist = self.viewControllers;
    [self setViewControllers:[NSArray array] animated:NO];
    vc.viewControllers = vclist;
    NSMutableArray *navlist = [self.navigationController.viewControllers mutableCopy];
    [navlist replaceObjectAtIndex:(navlist.count - 1) withObject:vc];    
    [self.navigationController setViewControllers:navlist animated:NO];
}
@end

MyTabBarControllerImplementation.h:

 

#import <UIKit/UIKit.h>
@interface MyTabBarControllerImplementation : UIViewController<UITabBarDelegate>
@property (nonatomic) int selectedIndex;
@property (strong, nonatomic) NSArray *viewControllers;
@property (strong, nonatomic) UITabBar *bar;
@end

MyTabBarControllerImplementation.m:

 

#import "MyTabBarControllerImplementation.h"
@implementation MyTabBarControllerImplementation
@synthesize viewControllers = _viewControllers;
@synthesize selectedIndex = _selectedIndex;
@synthesize bar = _bar;
-(CGRect) subviewFrame {
    CGRect f = self.view.frame;
    return CGRectMake(0, 30, f.size.width, f.size.height - 30);
}
-(void) reloadBarInfo {
    NSMutableArray *itms = [NSMutableArray arrayWithCapacity:_viewControllers.count];
    for(UIViewController *vc in _viewControllers) {
        UITabBarItem *itm = [[UITabBarItem alloc] initWithTitle:vc.title image:nil tag:0];
        [itms addObject:itm];
        [vc didMoveToParentViewController:self];
    }
    _bar.items = itms;    
    _bar.selectedItem = [itms objectAtIndex:0];
    _bar.autoresizingMask = UIViewAutoresizingFlexibleWidth;
    UIViewController *firstVC = [_viewControllers objectAtIndex:0];
    firstVC.view.frame = [self subviewFrame];
    firstVC.view.autoresizingMask = UIViewAutoresizingFlexibleWidth | UIViewAutoresizingFlexibleHeight;
    [self.view addSubview:firstVC.view];
}
-(void) viewDidLoad {
    self.bar = [[UITabBar alloc] initWithFrame:CGRectMake(0, 0, self.view.frame.size.width, 30) ];
    for(UIViewController *vc in _viewControllers) {
        [self addChildViewController:vc];
    }
    _bar.delegate = self;
    [self.view addSubview:_bar];
    [self reloadBarInfo];
}
-(void) tabBar:(UITabBar *)tabBar didSelectItem:(UITabBarItem *)item {
    int idx = [_bar.items indexOfObject:item];
    UIViewController *newVC = [_viewControllers objectAtIndex:idx];
    UIViewController *curVC = [_viewControllers objectAtIndex:_selectedIndex];
    if(newVC == curVC) {
        return; // Don't switch - same one selected!
    }
    [curVC.view removeFromSuperview];
    newVC.view.frame = [self subviewFrame];
    newVC.view.autoresizingMask = UIViewAutoresizingFlexibleWidth | UIViewAutoresizingFlexibleHeight;
    [self.view addSubview:newVC.view];
    [self transitionFromViewController:curVC toViewController:newVC duration:0 options:0 animations:nil completion:nil];
     self.selectedIndex = idx;
}
@end

And there you have it!

 

 

 

January 12, 2012

Features / Super Fuzzy Searching on PostgreSQL

JB

I have been working on doing some fuzzy searching on PostgreSQL for a while, and thought I'd share some of my experiences here.  Many people are not yet aware of the incredible fuzzy search capabilities of PostgreSQL, especially that were added in the 9.1 release. 

Most people use an external engine such as Lucene to do fuzzy searches.  While, as you will see, there are still reasons to do so, they are much fewer now than they used to be.

Fuzzy Search Using Trigrams

While PostgreSQL has lots of different fuzzy search options (soundex, levenshtein, etc.), the one which has the most support is trigram searching.  What is a trigram?  Trigrams break a word up into 3-letter sequences, which can then be used to do similarity matches.  Think of the word "hello".  It has the following trigrams: "h", "he", "hel", "ell", "llo", "lo", and "o".  Now look at the word "hallo".  It has the following trigrams: "h", "ha", "hal", "all", "llo", "lo", and "o".  You can see that there are several trigrams that match, and several that don't match.  Similarity is computed via cosine similarity (which I don't totally understand), but, basically, the more trigrams you have in common the closer the match.  In the present case, 'hallo' and 'hello' have a similarity of 0.333333.  You can see this in PostgreSQL by saying:

select similarity('hello', 'hallo')

PostgreSQL also has two other relevant operators that use similarity - the "<->" operator and the "%" operator.  "<->" is the "distance" operator, which is simply one minus the similarity (i.e. if the similarity is 0.2 the distance will be 0.8).  For text searching, "%" is the "similar" operator, which returns true if two strings are similar, or false if they are not.  Two strings are defined as "similar" if their similarity is 0.3 or greater.  This can be set with set_limit(), but it is not usually useful to do so (we will deal with custom similarities later).  The "%" operator is important, because of its heavy use in indexing.

Installing Trigrams

The trigram module (pg_trgm) is not installed by default in PostgreSQL, but is one of their standard (and supported!) extensions.  To install, you need to make sure you build the module in the source code.  To do this, go into the contrib/pg_trgm directory of the source code and do a "make; make install".  This will install the plugin, but will not activate it.  Then, in any database you want to use this plugin, put in the command:

CREATE EXTENSION pg_trgm;

This will load the necessary SQL to install the functions and operators needed to make trigrams work.  If you type that command in the template1 database, then all future databases created on that system will have the extension installed.

Using Trigrams Without an Index

The simplest use of trigrams would be to have a table with a name column, and do a search like this:

SELECT * FROM people WHERE lastname % 'Schmoe'

This will return a list of people where the trigram similarity is greater than the default threshold.  If you want to specify the similarity you are looking for, you can change that to:

SELECT * FROM people WHERE similarity(lastname, 'Schmoe') > 0.5

This will give a less-tolerant match than the default.

The problem with doing this is that it takes a lot of processing power.  The reason people usually use databases is because they have a huge number of records and need to be able to search them quickly.  If we have 10,000 records, manually doing trigram matches will take quite a bit of time.

So, one of the best features of PostgreSQL 9.1 is the ability for PostgreSQL to incorporate trigrams into an index.  So, to make an index to speed up our search, do:

CREATE INDEX people_lastname_trigram_idx ON people USING gist(lastname gist_trgm_ops);

This will create a "gist" index, which is a special type of index which is often used in spacial and other vector-oriented matching.  The "gist_trgm_ops" is a little bit of magic that tells gist how to use a text field as a vector of trigrams.

Now, your query will use the index, provided you have enough rows to make it worthwhile.  HOWEVER, it is not yet using the index efficiently.  While PostgreSQL can (and will) use the index for this query, it doesn't actually check the condition until a later step (I believe, but am not sure, that it is using the index to get everything in the right order, and then at a later stage doing filtering).  For large tables, this can still be several orders of magnitude worse than you want.  To fix this, you MUST use the "%" operator.  This will cause the index to have an "index condition", which it will use to limit the results that it puts out.  So, if you have:

SELECT * FROM people WHERE lastname % 'Schmoe';

This will use the index AND the index condition, and will be super-fast (except in extremely large datasets - see next section).  However, this uses the built-in cutoff point of 0.3 similarity.  What if I want my results to be tighter than that?  The best way is to combine % with similarity(), like this:

SELECT * FROM people WHERE lastname % 'Schmoe' AND similarity(lastname, 'Schmoe') > 0.5;

In this case, the database will use the "%" condition for the index, which will be incredibly fast, and the "similarity()" call as a post-index filter, which, since there is a much smaller dataset, will still be pretty fast.

Working With Larger Datasets

I had trouble working with trigrams because my main dataset is about 14 million titles.  At this size, trigram searching is really slow.  With the whole database loaded in-memory (either in PG's cache or in the filesystem cache - I have many, many gigs of memory), a trigram search of the database USING THE INDEX took between 3 and 12 seconds, depending on the string.  Poking around, I came out with the following basic results for a database on an Amazon EC2 4XL instance without any other load:

  • 100,000 records: 40 milliseconds
  • 1,000,000 records: 0.4 seconds
  • 14,000,000 records: 7 seconds

So, as you can see, the useful limit to these queries is about 1 million records.  Therefore, if you have more than 14 million rows, if you want to do trigrams, you need to find a way to boost performance.  I've found there are two basic methods you can use to speed up trigrams on large datasets:

  1. Create specialized indexes with subsets of your data
  2. Create a separate wordlist table with unique words

#1 is much preferred if you can do it, because it is an entirely database-oriented method.  Basically, what I did, is I figured out that on my queries I only needed to trigram search on about 1 million of my 14 million records.  Therefore, I specialized the index by adding a "WHERE" clause to the index itself.  On my database, I have a "type" field, and I'm only searching certain types of records.  Therefore, I changed my index to the following:

CREATE INDEX people_lastname_restricted_trigram_idx ON people USING gist(lastname gist_trgm_ops) WHERE type IN ('my_type1', 'my_type2');

This meant that only about a million records were being stored in the index, which brought down my search time considerably.  Another search that I was doing was based on the person's "popularity", in which case, I added another index for popularity being greater than 20.  This restricted the set down to about 100,000 records, which made it blazingly fast.

The other alternative (#2) I mentioned is to create a separate table of words which is searched.  The idea is that while you may have 14 million records, there probably aren't 14 million words.  Therefore, you create a table using unique words from the database, and then add a trigram index on that.  The problem, though, is that this takes either (a) a lot of manual work, or (b) a lot of triggers, none of which is the way I like to administer databases.  Nonetheless, it can work if you are merely wanting to use trigrams to fuzzy-match individual words.  It can be especially helpful in conjunction with PostgreSQL's full-text search (previously known as tsearch2), which still operates blazingly-fast on 14 million records.  Fuzzy string matching on words can be combined with full-text search on records to present a powerful and fast method of searching.   The wordlist can be fuzzy-matched to provide better hints at what words are in the fulltext.

To create a wordlist, do something like this:

CREATE TABLE search_words (word text);
CREATE INDEX search_words_word_trigram_idx ON search_words USING gist(word gist_trgm_ops);

Then, every time you need to regenerate your wordlist, do the following:

DELETE FROM search_words;
INSERT INTO search_words (word) SELECT DISTINCT lower(word) FROM (SELECT unnest(regexp_split_to_array(lastname, E'\\\\W+')) AS word FROM people) words WHERE word IS NOT NULL AND length(word) > 3;

What that does is explode the lastname field out into words (in case a lastname is multi-word), where each word is its own record.  It also filters out excessively short words (optional).  It then inserts these into the search_words table, which has a trigram index.  So, now, if you want to fuzzy-match individual words, you can use the search_words table!

Anyway, additional fun can be had by using unaccent to remove accents before all of this processing, or other fun to massage your data however you feel is best.  The important thing is that, for all but the most strenuous data sets, we can now keep all of the searching located within the database itself, managed by database tools, without having to rely on external search engines such as Lucene, which take time, management, machine power, and their own sets of headaches.  Lucene itself often has to be hand-rigged for large datasets even.  I am a big believer in pushing as much data work to the database as possible, and PostgreSQL keeps pushing the bar higher!

December 22, 2011

General / Enjoy Skyrim's Alchemy? Check Out Nature's Own Alchemy System

JB

If you have played The Elder Scrolls V: Skyrim, you have probably made use of their "alchemy" system.  What you may not have known is that this is based, in some degree, on the real-life practices of both ancient and modern herbalists, who combine herbs and herbal actions to produce desired effects on the body and the body's health.  

If you are interested in learning more, you should check out The Herbalist, a new iPhone/iPad/iPod/iOS app which describes a number of herbs and the actions that herbalists use them for.  In addition, you can even search the herbal database by the action you are looking for or by the specific malady for which it is often used.  It also includes information on different ways of preparing herbs for usage in tinctures, oils, infusions, lotions, and other preparations.

Anyway, if you enjoy Skyrim's alchemy, you'll love seeing how practicing herbalists make use of the same basic ideas.

June 26, 2011

Snippets / Misc stuff I found out

JB

Objective-C PostgreSQL stuff:


Misc:
Building a shared library on OSX: (see here):
gcc -dynamiclib -install_name /usr/local/lib/libfoo.2.dylib  -compatibility_version 2.4 -current_version 2.4.5 -o libfoo.2.4.5.dylib source.o code.o

I have also seen the options -undefined suppress -flat_namespace here

Another dylib link

Caution mixing shared and archive libs

Seth Godin - how to run a useless conference

 

June 21, 2011

Platforms / Introducing Newm: An Objective-C Framework for Web Applications

JB

I am working on starting a web application framework called Newm. It is kind of like an Objective-C-on-Rails.  Check it out!

June 18, 2011

Platforms / Objective-C Hackery

JB

Been working on several objective-c projects.  I love to get into the fun tricks you can do with languages and stuff.  Here's some interesting tidbits I found:

 

June 07, 2011

Platforms / Miscellaneous RFID Links

JB

Learning RFID for a project I'm going to work on soon.  Here are some handy dandy links!

May 25, 2011

General / Ideas 2011-05-25

JB

Had some thoughts today, thought I'd write them down:

Been looking at Google refine.  Some of this is pretty obvious, but I'm thinking it would be pretty awesome to port this to use on a database backend.  Basically, do the transforms on the data in the database, and, if you want, it will serialize it back out to a new table, or replace the existing one.

What would also be cool is a generalized report generator for Rails, possibly using AREL.  You could do security by defining mandatory filters for tables, limiting the tables/columns, and other things.  It could predict the time the report will take to run by using PostgreSQL's "explain" command.

May 24, 2011

Platforms / Database Scaling with PostgreSQL

JB

I've been digging around on various database scaling options for PostgreSQL.  Here's some interesting links I've stumbled upon:

 

 

May 23, 2011

Platforms / Getting EC2 Instances to Talk to Each Other

JB

I am getting started with EC2, and I noticed a problem - I was unable to get my instances to talk to each other - specifically, mounting an NFS drive.  After searching through my forums, I found my answer - I was using a custom Security Group.

In the default security group, instances within that security group can talk freely to each other.  What is surprising is that the default mode for a custom security group is to prevent all communication between the instances.  Therefore, you have to add permissions for the security group to be able to receive incoming traffic from the security group itself.  In the "host" line, rather than put an IP address, you can put in a security group ID (i.e. sg-abc123), and then specify 0-65535 for the port range.

Then, viola!  It works!