How would you implement tail efficiently?
Asked Answered
S

5

20

What is the efficient way to implement tail in *NIX? I came up (wrote) with two simple solution, both using kind of circular buffer to load lines into circular structure (array | doubly linked circular list - for fun). I've seen part of older implementation in busybox and from what I understood, they used fseek to find EOF and then read stuff "backwards". Is there anything cleaner and faster out there? I got asked this on interview and asker did not look satisfied. Thank you in advance.

Saboteur answered 15/4, 2012 at 18:0 Comment(1)
I like this question because it's a really important lesson when learning programming (and systems stuff in general). Some operations are just inherently not possible to do efficiently, at least not given the standard representation of the data you're working with (in this case, a linear byte-stream file starting from the beginning). Learning to recognize this simply from the format of the data, and to avoid pairing data and operations that can't work together efficiently, is a key part of learning to write efficient software.Melanesian
L
18

I don't think there are solutions different than "keep the latest N lines while reading forward the data" or "start from the end and go backwards until you read the Nth line".

The point is that you'd use one or the another based on the context.

The "go to the end and go backwards" is better when tail accesses a random access file, or when the data is small enough to be put on memory. In this case the runtime is minimized, since you scan the data that has to be outputted (so, it's "optimal")

Your solution (keep the N latest lines) is better when tail is fed with a pipeline or when the data is huge. In this case, the other solution wastes too much memory, so it is not practical and, in the case the source is slower than tail (which is probable) scanning all the file doesn't matter that much.

Lupitalupo answered 15/4, 2012 at 18:10 Comment(1)
If you are curious why "keep the N latest lines" is better when dealing with pipes: "go to the end and go backward" needs the usage of seek, pipes and terminals are not seekableDenoting
C
9

Read backwards from the end of the file until N linebreaks are read or the beginning of the file is reached.

Then print what was just read.

I dont think any fancy datastructures are needed here.

Here is the source code of tail if you're interested.

Cloddish answered 15/4, 2012 at 18:3 Comment(0)
T
7

First use fseek to find the end-of-file then subtract 512 and fseek to that offset, then read forward from there to end. Count the number of line-breaks because if there are too few you will have to do the same with a subtracted offset of 1024 ... but in 99% of cases 512 will be enough.

This (1) avoids reading the whole file forward and (2) the reason why this is probably more efficient than reading backwards from the end is that reading forward is typically faster.

Taeniafuge answered 27/6, 2013 at 5:36 Comment(1)
and double the offset every time it fails.Aklog
M
2
//author: jvk
#include <stdio.h>
#include <string.h>
int main(int argc,char *argv[])
{
     FILE *fp;
     int beg,count=10,i=0;
     char ch,line[100];
     fp = fopen(argv[1], "r");
     //to hold the begining of the file
     beg = ftell(fp);
     //read file backwards
     fseek(fp,0,SEEK_END);
     // moving file pointer such that it points to at begining of last 
ten lines of file
     while(count && ftell(fp)!= beg)
     {
         ch = getc(fp);
         if(ch=='\n')
         {
             count--;
         }
         i--;
         //moving backwards.
         fseek(fp,i,SEEK_END);
     }
     //printing from current position of fp.
     while(fgets(line,sizeof(line),fp) != NULL)
     {
         printf("%s",line);
     }
     fclose(fp);
     return 0;
}
Mcintire answered 19/12, 2022 at 14:55 Comment(1)
You have posted a solution, but it may worth to explain how it is efficient or not.Barnaul
B
0

/*This example implements the option n of tail command.*/

#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <getopt.h>

#define BUFF_SIZE 4096

FILE *openFile(const char *filePath)
{
  FILE *file;
  file= fopen(filePath, "r");
  if(file == NULL)
  {
    fprintf(stderr,"Error opening file: %s\n",filePath);
    exit(errno);
  }
  return(file);
}

void printLine(FILE *file, off_t startline)
{
  int fd;
  fd= fileno(file);
  int nread;
  char buffer[BUFF_SIZE];
  lseek(fd,(startline + 1),SEEK_SET);
  while((nread= read(fd,buffer,BUFF_SIZE)) > 0)
  {
    write(STDOUT_FILENO, buffer, nread);
  }
}

void walkFile(FILE *file, long nlines)
{
  off_t fposition;
  fseek(file,0,SEEK_END);
  fposition= ftell(file);
  off_t index= fposition;
  off_t end= fposition;
  long countlines= 0;
  char cbyte;

  for(index; index >= 0; index --)
  {
    cbyte= fgetc(file);
    if (cbyte == '\n' && (end - index) > 1)
    {
      countlines ++;
      if(countlines == nlines)
      {
    break;
      }
     }
    fposition--;
    fseek(file,fposition,SEEK_SET);
  }
  printLine(file, fposition);
  fclose(file);
}

int main(int argc, char *argv[])
{
  FILE *file;
  file= openFile(argv[2]);
  walkFile(file, atol(argv[1]));
  return 0;
}

/*Note: take in mind that i not wrote code to parse input options and arguments, neither code to check if the lines number argument is really a number.*/
Brisance answered 27/6, 2013 at 5:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.