Google Did you Mean on Search Pages Using PHP 4

August 24th, 2007  |  Published in PHP, Web Development  |  1 Comment

What’s the best way to handle did you mean? It’s one of the greatest features of Google, I never worry about misspelling search terms because I know Google will always show the error of my ways.

On Well.ca we had previously implemented this feature using Levenshtein, this measures the number of permutations to get from one string to another. The problem with this is that we had the search term “Obus” being corrected to “Burt’s”, not exactly the most accurate system. As a result, I’ve been working with using Pspell/Aspell to handle the “Did you mean”. Aspell is a very popular spell checking software, you can find it used in almost any OS application requiring spell checking. PHP interfaces with Aspell through Pspell.

One of the requirements we have, is that we want Pspell/Aspell to use a custom dictionary. Being an Ecommerce site, we want the dictionary to consist of brands, product names, categories, and product subtitles. A limitation of Pspell is that it does not allow you to use just a personal dictionary, therefore, I went through the process of creating a custom language dictionary for Pspell/Aspell to use. I have described all the steps below, from the initial install to the actual checking.

Installing:

Note: These are the following steps I used to install Aspell/Pspell on my Macbook Pro running the latest version of OS X. They should work fine for Linux/Unix systems, for Windows there are other resources on the web for installing these tools.

  1. Install Aspell - Install Aspell using your package manager (I use MacPorts) or from the Aspell website.
  2. Compile PHP w/ PSpell support
    1. First check to see if your PHP install already is built w/ Pspell (look at the output of php_info() for the configure flag –with-pspell)
    2. (If you are using MacPorts look at the bullet below) If not, compile PHP with all the options you want including “–with-pspell={prefix}”. Where prefix is the location of your lib and include containing Pspell.
      • If you are using MacPorts, you can add the flag to the Portfile. On my computer, the Portfile was located at “/opt/local/var/macports/sources/ rsync.macports.org/release/ports /www/php4/Portfile”. Open that file and add “–with-pspell=${prefix}” to configure.args. You can then uninstall PHP4 and reinstall it, it should now use the modified Portfile to compile PHP.
    3. Restart Apache, and check your php_info() for the proper flags.
  3. You should now be capable of using pspell in PHP.


Creating a Custom Language Dictionary

Note: If you don’t need a custom language dictionary, you can skip straight to the creating “Did you mean” section.

  1. Download the aspell-lang package:
    [code]cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/aspell co aspell-lang[/code]
    The remainder of the steps are now based off of the README file
  2. Head into the aspell-lang directory. Aspell-lang comes with a command for creating the files used to create a new custom dictionary. The command to create these files is:
    [code]./pre {lang_name} {char_set}[/code]

    • lang_name must follow the ISO language standards. At the simplest, two alphabetical characters.
    • char_set aspell can use a variety of character sets. See the README for a list of them. For English, I used iso-8859-1
    • The command I used was: ./pre we iso-8859-1
  3. You should now have a directory called the language name selected. CD into that directory.
  4. At this point, you can edit the info file or leave it at the default. It is fairly self explanatory.
  5. You can now create the wordlist in the file {lang_name}.wl. The format is one word per line.
  6. By default, aspell does not allow any non-alphabetical characters in words. To allow other characters you need to modify {lang_name}.dat
    • The line you need to modify starts with special, you might need to uncomment it.
    • The format of the special line is: special {char} {start}{middle}{end}. char is the character to be included. start, middle, end tells Aspell where the character is allowed. ‘*’ means the character is allowed, ‘-’ means the character is not.
    • An example special attribute is ‘ -*-, the character is , and it is not allowed as the first or last character of the word but is allowed in the middle of the word.
  7. When the wordlist is finalized, you need to run the following commands:
    ./proc
    ./configure
    make
    make install
  8. Ignoring any problems, you should now have a custom dictionary installed.


Creating “Did you mean”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
//Split the search string by spaces
$keywords_single = explode(' ', $keywords_string);
//Get the number of words they are searching by
$sizeof_keywords_single = sizeof($keywords_single);
$pspell_config = pspell_config_create("we");
$pspell_link = pspell_new_config($pspell_config);
$keywords_single_replacement = array(); //Create the replacement array
for($i = 0; $i < $sizeof_keywords_single; $i++) { //Loop through the words
	//Check if the word is correctly spelt
	if (!pspell_check($pspell_link, $keywords_single[$i])) {
		$keywords_misspell = true;
		//Get the suggestions from Pspell
		$suggestions = pspell_suggest($pspell_link, $keywords_single[$i]);
		//Take the first result (Pspell sorts the result)
		$keywords_single_replacement[$i] = $suggestions[0];
	} else {
		//Set the replacement word back to the original word
		$keywords_single_replacement[$i] = $keywords_single[$i];
	}
}
/*
 * The following code replaces the words with the corrected words above.
 */
if (!isset($keywords_phrase) && isset($keywords_misspell)) {
	$keywords_replacement = stripslashes($_GET['keyword']);
	for ($i = 0; $i < $sizeof_keywords_single; $i++) {
		$keywords_replacement = preg_replace(
					sprintf('#(?!<.*?)(%s)(?![^<>]*?>)#i',
					preg_quote($keywords_single[$i])),
					$keywords_single_replacement[$i],
					$keywords_replacement);
	}
}
//You can now print the Did You Mean $keywords_replacement...
?>

Resources

  • Pspell Manual - http://www.php.net/manual/en/ref.pspell.php
  • Aspell Website - http://aspell.net/

Responses

  1. a canadian startup » Blog Archive » Improving Search -- PHP, pspell, and Google-like "Did you mean?" says:

    August 24th, 2007 at 11:15 am (#)

    [...] you’re interested in doing a google-like “Did you Mean?” feature in PHP on your search page, read chris’s post on how he implemented [...]

Leave a Response

Subscribe via RSS

If you like the content of this website and are looking for a way to be notified of new content, look no further. Just click the orange icon to your right and subscribe using your favorite feed reader.