Skip to content

typeflow/typeflow-go

Repository files navigation

typeflow-go GoDoc Build Status Coverage Status

Introduction

typeflow is a tiny package that provides a few tools around string-based searching needs. With typeflow you'll be able to search for sub-string matches and get string similarity information.

You can see it in action here: https://typeflow-web.herokuapp.com/ .

Quick start

Likely you'll want to use the WordSource type which provides the most high-level interface in this package.

Dependencies

This project currently only depends on:

So make sure you go get it :)

go get github.com/alediaferia/prefixmap

Plain Levenshtein distance computation

If you just need to compute the Levenshtein distance between 2 words this is what you need:

typeflow.LevenshteinDistance("alessandro", "alesasndro")

This will return

2

Querying for similar strings

This package can be used for querying against a source of strings. For this particular need WordSource has been designed specifically.

ws := NewWordSource()
ws.SetSource(myListOfStrings)
matches := ws.FindMatches("query", 0.4)

The similarity

0.4 represents the minimum similarity we are OK with. A value of 1.0 represents an exact match.

A straightforward example

Supposedly you have a list of words, say country names, and you have a partial string which may match one or more of them according to a certain similarity value that is suitable for you. This may be the case when, for example, providing a typeahead API for populating a dropdown of suggestions (see Google).

In the following example we will have a program that holds an hard-coded list of country names and accepts 2 args:

  • the substring to look for
  • the accepted minimum similarity for the matches
package main

import (
  . "github.com/typeflow/typeflow-go"
  "strings"
  "os"
  "fmt"
  "strconv"
  "path/filepath"
)

var country_list = []string{
"mexico",
"micronesia",
"moldova",
"monaco",
"mongolia",
"montenegro",
"morocco",
"mozambique",
"myanmar",
"namibia",
"nauru",
"nepal",
"netherlands",
"new zealand",
"nicaragua",
"niger",
"nigeria",
"norway",
}

func printHelpAndExit() {
  fmt.Printf("usage: %s substr similarity\n", filepath.Base(os.Args[0]))
  os.Exit(0)
}

func main() {
  args := os.Args[1:]
  
  if l := len(args); l != 2 {
    fmt.Printf("Unexpected number of arguments: got %d, expected 2\n", l)
    printHelpAndExit()
  }
  
  similarity, err := strconv.ParseFloat(args[1], 32)
  if err != nil {
    fmt.Printf("Please, specify similarity as a floating point number\n")
    printHelpAndExit()
  }
  
  substr := args[0]

  // let's setup our word source
  // we will use to search for matches
  ws := NewWordSource()
  
  ws.SetSource(country_list)
  
  matches, err := ws.FindMatch(substr, float32(similarity))
  if err != nil {
    panic(err)
  }
  
  if len(matches) > 0 {
    fmt.Println("Found the following matches:\n")
    for _, match := range matches {
      fmt.Printf("'%s', similarity: %f\n", match.String, match.Similarity)
    }
  } else {
    fmt.Printf("No match found for '%s'.\n", substr)
  }
}

Output:

$ go run <program name> nig 0.4
Found the following matches:

'niger', similarity: 0.600000
'nigeria', similarity: 0.428571

A note on similarity

The similarity between the given substring and the found match is computed using the following formula:

$$levenshtein(match,substr) 1.0 - --------------------------- max(|match|,|substr|)$$

Docs

I tried and will keep trying my best to keep the sources well documented. You can help me improving the docs as well!

Docs can be found at: https://godoc.org/github.com/typeflow/typeflow-go

Contribute

I love contributions! Please create your own branch and push a merge request.

Feel free to open issues for anything 😄

License

I'm releasing this project with a MIT license included in this repository.

Copyright (c) Alessandro Diaferia alediaferia@gmail.com