Variable Type for searching both with id and string

Asked 26/7, 2020 at 17:14 Answered 26/7, 2020 at 19:35

I need to store in memory about 500-1000 entries of 3 fields with quick and effective search by both int and str values. Searching happens in quick bursts of about 300-500 requests. I'm not sure how to efficiently do it.

Stored data consists of 3 fields:

ID - Integer, which won't be sequential. I.e. it will be like (3, 5, 12, 55), but not (1, 2, 3, 4, 5).
Name - String.
Tags - String.

There are 3 possible scenarios:

Get ID by Name.
Get Name by ID.
Get Tags by ID.

Currently, I use two different Types:

THashedStringList with keypairs '%s=%i' to search by name.
Array of Records sorted by ID for other searches.

I find this highly inefficient and currently looking for new ideas. Any hints?

Ommatophore answered 26/7, 2020 at 17:14 Comment(6)

You could use 2 TDictionary. A TDictionary is actually a hash table, which means fast access. – Titanesque 26/7, 2020 at 17:27

I assume the IDs are unique. Are the names unique, too? – Denotative 26/7, 2020 at 17:48

Binary search of sorted arrays is very fast. – Jeter 26/7, 2020 at 17:49

In memory database is what you want. – Savagism 26/7, 2020 at 18:3

Yes, do what @DavidHeffernan suggests - you can use a FireDAC FDMemTable as your datastore and you can index it on various combinations of the fields. – Tupi 26/7, 2020 at 18:34

Andreas, yes, Names are unique too. – Ommatophore 26/7, 2020 at 19:22

As David Heffernan suggested, you might want to use a proper database for this.

But if you want a more lightweight solution, with excellent performance, you can use an object list to store all your items and two dictionaries that refer to these items by their IDs and names, respectively.

As an example, consider a frog:

type
  TFrog = class
    ID: Integer;
    Name: string;
    Address: string;
  end;

Like your example, this class has one integer and two string members. We assume that every frog has a unique ID and a unique name. (But two or more frogs may share the same address.)

Just so we will be able to test the performance, we create a primitive frog generation function:

function CreateRandomFrog: TFrog;
const
  FrogFirstNames: array[0..11] of string =
    ('Luke', 'Smith', 'John', 'Maggie', 'Rose', 'Bill', 'Edward', 'Harry',
     'Andrew', 'Michael', 'Molly', 'Arthur');
  FrogLastNames: array[0..7] of string =
    ('Jones', 'Stone', 'Rock', 'Hill', 'Waterfall', 'Sky', 'Flower', 'Rain');
  FrogInitials: array[0..25] of Char = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  FrogAddressesPrefixes: array[0..3] of string =
    ('Tree', 'Swamp', 'Lawn', 'Lake');
begin
  Result := TFrog.Create;
  try
    Result.ID := Random(10*N);
    Result.Name := FrogFirstNames[Random(Length(FrogFirstNames))] + #32 +
      FrogInitials[Random(Length(FrogInitials))] + '.' +
      FrogInitials[Random(Length(FrogInitials))] + '.' +
      FrogInitials[Random(Length(FrogInitials))] + '.' + #32 +
      FrogLastNames[Random(Length(FrogLastNames))];
    Result.Address := FrogAddressesPrefixes[Random(Length(FrogAddressesPrefixes))] +
      #32 + Random(Byte.MaxValue).ToString;
  except
    Result.Free;
    raise;
  end;
end;

This will create frogs like

ID: 123
Name: Bill D.H.H. Rock
Address: Tree 52

We also define a constant

const
  N = 1000000;

This is the number of frogs we will create at the same time.

Now, some action: Define a class

type
  TFrogFarm = class
    Frogs: TObjectList<TFrog>;
    FrogsByID: TDictionary<Integer, TFrog>;
    FrogsByName: TDictionary<string, TFrog>;
    constructor Create;
    destructor Destroy; override;
    procedure TrySearchFarm;
  end;

The idea is that the Frogs list owns the frog objects, while the FrogsByID and FrogsByName dictionaries only refer to the frog objects without owning them. These are dictionaries using the IDs and the names as their keys.

Implement it like so:

{ TFrogFarm }

constructor TFrogFarm.Create;
var
  Frog: TFrog;
begin

  // Create the list that owns the frog objects
  Frogs := TObjectList<TFrog>.Create;

  // Create the dictionaries that refer to the frog objects without owning them
  FrogsByID := TDictionary<Integer, TFrog>.Create;
  FrogsByName := TDictionary<string, TFrog>.Create;

  // Create N random frogs with unique IDs and names
  repeat
    Frog := CreateRandomFrog;
    if not FrogsByID.ContainsKey(Frog.ID) and not FrogsByName.ContainsKey(Frog.Name) then
    begin
      Frogs.Add(Frog); // transfer of ownership
      FrogsByID.Add(Frog.ID, Frog);
      FrogsByName.Add(Frog.Name, Frog);
    end
    else
      Frog.Free; // if this weren't a simple test project, we'd protect this better
  until Frogs.Count = N;

end;

destructor TFrogFarm.Destroy;
begin
  FreeAndNil(FrogsByName);
  FreeAndNil(FrogsByID);
  FreeAndNil(Frogs);
  inherited;
end;

procedure TFrogFarm.TrySearchFarm;
var
  Frog: TFrog;
  S1, S2: string;
  c1, c2, f: Int64;
begin

  QueryPerformanceFrequency(f);
  QueryPerformanceCounter(c1);

  if FrogsByID.TryGetValue(100, Frog) then
    S1 := 'There is a frog with ID 100.'#13#10'He or she lives at ' + Frog.Address + '.'
  else
    S1 := 'There is NO frog with ID 100.';

  if FrogsByName.TryGetValue('Maggie A.M.D. Flower', Frog) then
    S2 := 'There is a frog named "Maggie A.M.D. Flower".'#13#10'She lives at ' + Frog.Address + '.'
  else
    S2 := 'There is NO frog named "Maggie A.M.D. Flower".';

  QueryPerformanceCounter(c2);

  ShowMessage(S1 + sLineBreak + sLineBreak + S2 + sLineBreak + sLineBreak +
    'Execution time: ' + Round(1000000*(c2 - c1)/f).ToString + ' µs');

end;

To try this, do

begin
  Randomize;
  while True do
    with TFrogFarm.Create do
      try
        TrySearchFarm;
      finally
        Free;
      end;
end;

Finding an element in a dictionary is an O(1) operation, so it is fast even in very large collections. And, indeed, with one million frogs in the farm (N = 1000000), lookup takes about 2 microseconds on my system:

Denotative answered 26/7, 2020 at 18:31 Comment(9)

Actually, I think an in-memory dataset like a FDMemTable would provide a very concise & compact solution. It would only require a few lines of code to set up the table and indexes on it. – Tupi 26/7, 2020 at 18:38

@MartynA: Feel free to write that as an answer! – Denotative 26/7, 2020 at 18:42

(I do, however, feel that this approach also should be mentioned in this Q&A. But I very much expect the OP to mark the other solution as the accepted one, of course.) – Denotative 26/7, 2020 at 18:49

Of course. I'm just putting together an answer but its not without its snags - I'm not sure what the OP is thinking of in terms of the "tags". Anyway, I'll give it a shot. – Tupi 26/7, 2020 at 18:58

Both approaches are good to know. Also note that you can specify the initial capacity of the TDictionary when you create it, whch is useful when you have a large number of items. – Titanesque 26/7, 2020 at 19:9

@Olivier: Good point. I always use that myself, buy forgot it in this case. (And for pedagogical reasons I don't even know if it is a good idea adding it to this sample.) – Denotative 26/7, 2020 at 19:9

Also note that a hash table performs better than an index. See here for a discussion on that topic. – Titanesque 26/7, 2020 at 19:26

And, perhaps the most important observation: which solution is the most appropriate one depends on the precise context, which is not specified in the Q. And, to some extent, it also depends on the norms of the code base and the personal preferences of the developers. – Denotative 26/7, 2020 at 19:31

@Andreas Rejbrand, I really like your answer. It gives a clear idea of how the things work, while leaving creative work to be done by the OP. A nice to push to learn. It took me 4 hours to fully understand your answer and make things right, but now I can say that I learned something new. I've created .Add and .Clear procedures for object, as well as some shortcuts, by myself. And you get your well-desirved thumbs up. – Ommatophore 26/7, 2020 at 22:44

I've put together this answer at Andreas Rejbrand's suggestion, as a counterpoint to his TDictionary-based answer. It is unlikely to be able to perform as well as that, but is simpler in some respects to set up.

It shows up the limitations of a TDataSet-based approach in a number of respects, the principal one of which is the need to have maximum field sizes for the strings fields. FireDAC supports ftWideString fields, but that doesn't mean that you should use them to store Delphi "huge" strings.

For searching, I've used the standard dataset Locate function, but if you were after optimisation, it would probably be better to have indexes set for the different types of search and select the right one at run-time.

i was uncertain of how you intend to use the Tags field. If you wanted to have an arbitrary number of tags per record, it would be better to put these in a dataset on the detail side of a master-detail relation with FDMemTable1. Left as an exercise for the reader.

procedure TForm2.FormCreate(Sender: TObject);
var
  AField : TField;
  i : Integer;
begin
  AField := TIntegerField.Create(Self);
  AField.FieldName := 'ID';
  AField.DataSet := FDMemTable1;

  AField := TStringField.Create(Self);
  AField.FieldName := 'Name';
  AField.Size := 80;
  AField.DataSet := FDMemTable1;

  AField := TStringField.Create(Self);
  AField.FieldName := 'Tags';
  AField.Size := 80;
  AField.DataSet := FDMemTable1;

  // FDMemTable1.IndexFieldNames := 'Name;ID';
  FDMemTable1.CreateDataSet;

  FDMemTable1.DisableControls;
  try
    for i := 1 to 1000 do
      FDMemTable1.InsertRecord([i, 'Frog' + IntToStr(i), Chr(Ord('A') + Random(26))]);
  finally
    FDMemTable1.EnableControls;
  end;
end;

function TForm2.FindByName(const AName : String) : Boolean;
begin
  Result := FDMemTable1.Locate('Name', AName, []);
end;

function TForm2.FindByID(const AID: Integer) : Boolean;
begin
  Result := FDMemTable1.Locate('ID', ID, []);
end;

Tupi answered 26/7, 2020 at 19:35 Comment(0)

Recommended topics

Hot tags