How could I detect if a character close to another character on a QWERTY keyboard?
Asked Answered
A

5

5

I'm developing a spam detection system and have been alerted to find that it can't detect strings like this - "asdfsdf".

My solution to this involves detecting if the previous keys were near the other keys on the keyboard. I am not getting the input (to detect spam from) from the keyboard, I'm getting it in the form of a string.

All I want to know is whether a character is one key, two keys or more than two keys away from another character.

For example, on a modern QWERTY keyboard, the characters 'q' and 'w' would be 1 key away. Same would the chars 'q' and 's'. Humans can figure this out logically, how could I do this in code?

Anthelmintic answered 1/10, 2011 at 15:11 Comment(5)
What language are you developing in?Fredericksburg
Sorry, I didn't tag it, its PHP. Just tagging now...Anthelmintic
What's the target user group? Not all keyboards are QWERTY, for instance the common layout in Germany has QWERTZ. en.wikipedia.org/wiki/QWERTZ.Conceptualism
Its mainly for my own purposes, a small project, I only started 1 hour ago, its growing quite fast however. So I'm not going to support other layouts until I'm readyAnthelmintic
Then I wouldn't worry about making assumptions about the client keyboard right now, but do keep in mind that not all keyboards have the same layout so if you do run into problems because of that down the road you'll probably need to figure out the locale of the user and apply a different mapping of keys based on that.Conceptualism
S
6

You could simply create a two-dimensional map for the standard qwerty keyboard. Basically it could look something like this:

map[0][0] = 'q';
map[0][1] = 'a';
map[1][0] = 'w';
map[1][1] = 's';

and so on.

When you get two characters, you simply need to find their x, and y in the array 'map' above, and can simply calculate the distance using pythagoras. It would not fill the requirement you had as 'q' and 's' being 1 distance away. But rather it would be sqrt(1^2 + 1^2) approx 1.4

The formula would be:

  • Characters are c1 and c2
  • Find coordinates for c1 and c2: (x1,y1) and (x2,y2)
  • Calculate the distance using pythagoras: dist = sqrt((x2-x1)^2 + (y2-y1)^2).
  • If necessary, ceil or floor the result.

For example:

Say you get the characters c1='q', and c2='w'. Examine the map and find that 'q' has coordinates (x1,y1) = (0, 0) and 'w' has coordinates (x2,y2) = (1, 0). The distance is

sqrt((1-0)^2 + (0-0)^2) = sqrt(1) = 1
Spicule answered 1/10, 2011 at 15:23 Comment(2)
Brilliant! This is the exact solution, with the equation and everything, I would've never thought of this.Anthelmintic
Glad to help. Dont forget to accept the answer if it fully responded to your question.Spicule
R
4

Well, let's see. That's a tough one. I always take the brute-force method and I stay away from advanced concepts like that guy Pythagoras tried to foist on us, so how about a two-dimensional table? Something like this. maybe:

+---+---+---+---+---+---+---
|   | a | b | c | d | f | s ...
+---+---+---+---+---+---+---
| a | 0 | 5 | 4 | 2 | 4 | 1 ...
| b | 5 | 0 | 3 | 3 | 2 | 4 ...
| c | 4 | 3 | 0 | 1 | 2 | 2 ...
| d | 2 | 3 | 1 | 0 | 1 | 1 ...
| f | 3 | 2 | 2 | 1 | 0 | 2 ...
| s | 1 | 4 | 2 | 1 | 2 | 0 ...
+---+---+---+---+---+---+---

Could that work for ya'? You could even have negative numbers to show that one key is to the left of the other. PLUS you could put a 2-integer struct in each cell where the second int is positive or negative to show that the second letter is up or down from the first. Get my patent attorney on the phone, quick!

Rusk answered 1/10, 2011 at 15:28 Comment(1)
Interesting solution, not quite what I wanted, but another approach..I like itAnthelmintic
T
3

Build a map from keys to positions on an idealized keyboard. Something like:

'q' => {0,0},
'w' => {0,1},
'a' => {1,0},
's' => {1,1}, ...

Then you can take the "distance" as the mathematical distance between the two points.

Trover answered 1/10, 2011 at 15:20 Comment(1)
You will need a different map for every common different keyboard layout. French spam will come from an AZERTY keyboard for instance.Godbey
W
1

The basic idea is to create a map of characters and their positions on the keyboard. You can then use a simple distance formula to determine how close they are together.

For example, consider the left side of the keyboard:

  1 2 3 4 5 6
  q w e r t
  a s d f g
  z x c v b

Character a has the position [2, 0] and character b has the position [3, 4]. The formula for their distance apart is:

sqrt((x2-x1)^2 + (y2-y1)^2);

So the distance between a and b is sqrt((4 - 0)^2 + (3 - 2)^2)

It'll take you a little bit of effort to map the keys into a rectangular grid (my example isn't perfect, but it gives you the idea). But after that you can build a map (or dictionary), and lookup is simple and fast.

Wardrobe answered 1/10, 2011 at 15:25 Comment(0)
L
0

I developed a function for the same purpose in PHP because I wanted to see whether I can use it to analyse strings to figure out whether they're likely to be spam.

This is for the QWERTZ keyboard, but it can easily be changed. The first number in the array $keys is the approximate distance from the left and the second is the row number from top.

function string_distance($string){
    if(mb_strlen($string)<2){
        return NULL;
    }
    $keys=array(
        'q'=>array(1,1),
        'w'=>array(2,1),
        'e'=>array(3,1),
        'r'=>array(4,1),
        't'=>array(5,1),
        'z'=>array(6,1),
        'u'=>array(7,1),
        'i'=>array(8,1),
        'o'=>array(9,1),
        'p'=>array(10,1),
        'a'=>array(1.25,2),
        's'=>array(2.25,2),
        'd'=>array(3.25,2),
        'f'=>array(4.25,2),
        'g'=>array(5.25,2),
        'h'=>array(6.25,2),
        'j'=>array(7.25,2),
        'k'=>array(8.25,2),
        'l'=>array(9.25,2),
        'y'=>array(1.85,3),
        'x'=>array(2.85,3),
        'c'=>array(3.85,3),
        'v'=>array(4.85,3),
        'b'=>array(5.85,3),
        'n'=>array(6.85,3),
        'm'=>array(7.85,3)
    );
    $string=preg_replace("/[^a-z]+/",'',mb_strtolower($string));
    for($i=0;$i+1<mb_strlen($string);$i++){
        $char_a=mb_substr($string,$i,1);
        $char_b=mb_substr($string,$i+1,1);
        $a=abs($keys[$char_a][0]-$keys[$char_b][0]);
        $b=abs($keys[$char_a][1]-$keys[$char_b][1]);
        $distance=sqrt($a^2+$b^2);
        $distances[]=$distance;
    }
    return array_sum($distances)/count($distances);
}

You can use it the following way.

string_distance('Boat'); # output 2.0332570942187
string_distance('HDxtaBQrGkjny'); # output 1.4580596252044

I used multibyte functions because I was thinking about extending it for other characters. One could extend it by checking the case of characters as well.

Leisure answered 30/6, 2021 at 8:51 Comment(4)
From your code and logic, for the 'Boat', I get a output of 5.142193373156675 and for the second string, I get 2.9604941699753873Nitrosamine
@Nitrosamine I just copied the code from here and tried it again. The result is still the same.Leisure
I calculated the square roots manually as well as with python for "Boat". Try doing it by hand using the data coordinates you have for "Boat". count($distances) = 3. Am I right?Nitrosamine
@Nitrosamine $distances contains a list of distances (2.2360679774998, 2.4494897427832, 1.4142135623731). In the end the function calculates the average distance (2.0332570942187) between the letters.Leisure

© 2022 - 2024 — McMap. All rights reserved.