💡

Let's write Get_Next_Line(GNL) in 1 Hour

2024/05/02に公開

Assignment

Create a function that reads one line at a time from a file and returns it as a string, using static variables for buffering.

Strategy

Various implementation methods can be considered, but since it may appear on the Exam, we need to think of an implementation that can be written quickly in about an hour.
Implementing your own getc is recommended as it allows for easy implementation.

Other implementations using getc

Examples of implementing getline using getc:
https://opensource.apple.com/source/cvs/cvs-29/cvs/lib/getline.c.auto.html
https://github.com/lattera/freebsd/blob/master/contrib/file/getline.c

Let's implement getc() first

My recommendation is to first implement getc(3).
On the book of K&R, there is a very clean implementation of getchar (single buffer version) that we can refer to. (getchar is a function that reads one character from standard input and returns one character.)

B.W. Kernighan/D.M. Ritchie: The C Programming Language, 2nd Edition

/* getchar: single buffer version */
int getchar(void)
{
  static char buf[BUFSIZ];
  static char *bufp;
  static int n = 0;

  if(n == 0) { /* buffer is empty */
    n = read(0, buf, sizeof buf);
    bufp = buf;
  }
  return (--n >= 0) ? (unsigned char) *bufp++ : EOF;
}

Change "read(0,...)" to fd, and the file descriptor version of getc is complete.
Hooray!

For the bonus, you can combine the three static variables into a single structure.
If read fails, the return value will be negative, so error handling is necessary. For details, read man 2 read.

From here on, there are slight spoilers

Let's make putc()

Now that we have the ft_getc function that reads one character at a time while buffering, let's make the ft_putc function that writes one character at a time to memory. Even though we're writing one character at a time, calling malloc and copying each time would be slow and error handling would be troublesome, so we want to reduce the number of calls if possible.
Just as we reduced the number of read calls by buffering with getc, we also want to malloc a certain amount at a time with putc. However, since we can't know the length of a line in advance, we'll increase the capacity and reallocate when it becomes insufficient.

typedef struct s_string {
    char *str;  // string 
    size_t len; // length of string
    size_t capa;    // size of allocated area
} t_string;

int ft_putc(t_string *str, char c)
{
    if(str->len + 1 >= str->capa) {
        // If the area becomes insufficient, allocate a new one and copy  
    }
    str->str[str->len] = c;  // put in one character
    str->len++;
    return 0;
}

There are various ways to increase capa, but I think we often double capa and reallocate it. If the final string length is N, it only needs to be done O(log(N)) times.

The rest is just reading the string up to the newline

Now that we have a function that reads one character at a time while buffering, the rest is just puting one character at a time into a string until a newline or EOF comes, and returning it to complete get_next_line.

char *get_next_line(int fd) {
    // Using t_string str instead of t_string *str makes implementation easier
    // since there is no need to malloc and free the structure.
    t_string str;
    char c;

    // initialization
    str.str = NULL;
    str.len = 0;
    str.capa = 0;

    while(1) {  // infinite loop
        c = ft_getc(fd); // read one character

        if(c == EOF) {
            break;  // exit loop if end of file
        }
        ft_putc(&str, c);   // put in one character
        if(c == '\n') {
            break;  // exit loop if newline
        }
    }
    if(str.len > 0) {
        ft_putc(&str, '\0');    // put in NULL character at the end
    }
    return str.str;
}

This is roughly the flow, but ft_getc and ft_putc may return errors, so be sure to handle errors properly. (Don't forget to free str.str when handling errors).

↓ Press the ♡

Discussion