Let's write Get_Next_Line(GNL) in 1 Hour
Let's write Get_Next_Line(GNL) in 1 hour
And with only three functions. (of course, following Nominette).
Assignment
Create a function that reads one line at a time from a file and returns it as a string, using static variables for buffering.
Strategy
Various implementation methods can be considered, but since it may appear on the Exam, we need to think of an implementation that can be written quickly in about an hour.
Implementing your own getc is recommended as it allows for easy implementation.
Other implementations using getc
There is a function called getline, which is the original source of get_next_line.There are various implementations, including GNU, BSD and musl.
Examples of implementing getline using getc:
-
NetBSD
https://github.com/NetBSD/src/blob/trunk/tools/compat/getline.c -
musl
https://github.com/BlankOn/musl/blob/master/src/stdio/getline.c
https://github.com/BlankOn/musl/blob/master/src/stdio/getdelim.c
Let's implement getc(3) first
My recommendation is to first implement getc(3).
On the book of K&R, there is a very clean implementation of getchar (single buffer version)[1] that we can refer to. (getchar is a function that reads one character from standard input and returns one character.)
B.W. Kernighan/D.M. Ritchie: The C Programming Language, 2nd Edition
/* getchar: single buffer version */
int getchar(void)
{
static char buf[BUFSIZ];
static char *bufp;
static int n = 0;
if(n == 0) { /* buffer is empty */
n = read(0, buf, sizeof buf);
bufp = buf;
}
return (--n >= 0) ? (unsigned char) *bufp++ : EOF;
}
Change "read(0,...)" to fd, and the file descriptor version of getc is complete.
Hooray!
- For the bonus, you can combine the three static variables into a single structure.
- If read fails, the return value will be negative, so error handling is necessary. For details, read "man 2 read".
- If a read error occurs or an empty file is read, n becomes negative. If you then proceed to read the next file, it will start with a negative n, causing unexpected behavior, so be cautious.
- Using read(fd, buf, sizeof(buf)) may result in the machine grading system producing an error such as "Read with a size different than BUFFER_SIZE". However, read(fd, buf, BUFFER_SIZE) works fine. Although the values should technically be the same when using an array, the machine grading system might not be checking this very rigorously.
- As per the C language specification, static variables are automatically initialized to 0 or NULL[2].
Let's check if ft_getc works!
Now let's test to see if ft_getc works correctly.
#include ‘get_next_line.h’
#include <fcntl.h>
#include <stdio.h>
int main(void) {
int fd;
int c;.
fd = open(‘test.txt’, O_RDONLY);
while (1) {
c = ft_getc(fd);
if (c == EOF)
break ;
printf(‘%c’, c);
}
close(fd);
return (0);
}
If the contents of test.txt are output as they are, success.
All that remains is to write a function that returns a string one line at a time, and get_next_line is complete.
That's easy.
Spoilers from here on in.
Let's make putc(3)
Now that we have the ft_getc function that reads one character at a time while buffering, let's make the ft_putc function that writes one character at a time to memory. Even though we're writing one character at a time, calling malloc and copying each time would be slow and error handling would be troublesome, so we want to reduce the number of calls if possible.
Just as we reduced the number of read calls by buffering with getc, we also want to malloc a certain amount at a time with putc. However, since we can't know the length of a line in advance, we'll increase the capacity and reallocate when it becomes insufficient.
typedef struct s_string {
char *str; // string
size_t len; // length of string
size_t capa; // size of allocated area
} t_string;
int ft_putc(t_string *str, char c)
{
if(str->len + 1 >= str->capa) {
// If the area becomes insufficient, allocate a new one and copy
}
str->str[str->len] = c; // put in one character
str->len++;
return 0;
}
There are various ways to increase capa, but I think we often double capa and reallocate it. If the final string length is
The rest is just reading the string up to the newline
Now that we have a function that reads one character at a time while buffering, the rest is just puting one character at a time into a string until a newline or EOF comes, and returning it to complete get_next_line.
char *get_next_line(int fd) {
// Using t_string str instead of t_string *str makes implementation easier
// since there is no need to malloc and free the structure.
t_string str;
char c;
// initialization
str.str = NULL;
str.len = 0;
str.capa = 0;
while(1) { // infinite loop
c = ft_getc(fd); // read one character
if(c == EOF) {
break; // exit loop if end of file
}
ft_putc(&str, c); // put in one character
if(c == '\n') {
break; // exit loop if newline
}
}
if(str.len > 0) {
ft_putc(&str, '\0'); // put in NULL character at the end
}
return str.str;
}
This is roughly the flow, but ft_getc and ft_putc may return errors, so be sure to handle errors properly. (Don't forget to free str.str when handling errors).
My get_next_line
get_next_line.c
3 functions, only 95 lines!
get_next_line_utils.c
Empty!!
get_next_line.h
If you like, you can also read here
Fract'ol - How to zoom to the limit
↓ Press the ♡
-
Brian W. Kernighan, Dennis M. Ritchie, The C Programming Language(2nd ed), Prentice Hall, Inc.(1988) ↩︎
-
ISO/IEC 9899, 6.7.8 Initialization, https://port70.net/~nsz/c/c99/n1256.html#6.7.8p10 ↩︎
Discussion