typeflow is a tiny package that provides a few tools around string-based searching needs. With typeflow you'll be able to search for sub-string matches and get string similarity information.
You can see it in action here: https://typeflow-web.herokuapp.com/ .
Likely you'll want to use the WordSource
type which provides the most high-level interface in this package.
This project currently only depends on:
So make sure you go get
it :)
go get github.com/alediaferia/prefixmap
If you just need to compute the Levenshtein distance between 2 words this is what you need:
typeflow.LevenshteinDistance("alessandro", "alesasndro")
This will return
2
This package can be used for querying against a source of strings. For this particular need WordSource has been designed specifically.
ws := NewWordSource()
ws.SetSource(myListOfStrings)
matches := ws.FindMatches("query", 0.4)
0.4
represents the minimum similarity we are OK with. A value of 1.0 represents an exact match.
Supposedly you have a list of words, say country names, and you have a partial string which may match one or more of them according to a certain similarity value that is suitable for you. This may be the case when, for example, providing a typeahead API for populating a dropdown of suggestions (see Google).
In the following example we will have a program that holds an hard-coded list of country names and accepts 2 args:
- the substring to look for
- the accepted minimum similarity for the matches
package main
import (
. "github.com/typeflow/typeflow-go"
"strings"
"os"
"fmt"
"strconv"
"path/filepath"
)
var country_list = []string{
"mexico",
"micronesia",
"moldova",
"monaco",
"mongolia",
"montenegro",
"morocco",
"mozambique",
"myanmar",
"namibia",
"nauru",
"nepal",
"netherlands",
"new zealand",
"nicaragua",
"niger",
"nigeria",
"norway",
}
func printHelpAndExit() {
fmt.Printf("usage: %s substr similarity\n", filepath.Base(os.Args[0]))
os.Exit(0)
}
func main() {
args := os.Args[1:]
if l := len(args); l != 2 {
fmt.Printf("Unexpected number of arguments: got %d, expected 2\n", l)
printHelpAndExit()
}
similarity, err := strconv.ParseFloat(args[1], 32)
if err != nil {
fmt.Printf("Please, specify similarity as a floating point number\n")
printHelpAndExit()
}
substr := args[0]
// let's setup our word source
// we will use to search for matches
ws := NewWordSource()
ws.SetSource(country_list)
matches, err := ws.FindMatch(substr, float32(similarity))
if err != nil {
panic(err)
}
if len(matches) > 0 {
fmt.Println("Found the following matches:\n")
for _, match := range matches {
fmt.Printf("'%s', similarity: %f\n", match.String, match.Similarity)
}
} else {
fmt.Printf("No match found for '%s'.\n", substr)
}
}
Output:
$ go run <program name> nig 0.4
Found the following matches:
'niger', similarity: 0.600000
'nigeria', similarity: 0.428571
The similarity between the given substring and the found match is computed using the following formula:
I tried and will keep trying my best to keep the sources well documented. You can help me improving the docs as well!
Docs can be found at: https://godoc.org/github.com/typeflow/typeflow-go
I love contributions! Please create your own branch and push a merge request.
Feel free to open issues for anything 😄
I'm releasing this project with a MIT license included in this repository.
Copyright (c) Alessandro Diaferia alediaferia@gmail.com