C++ - Does getline have a maximum string or character capacity?
Asked Answered
F

2

5

My question is in reference to an assignment that I am working on. It seems like there are multiple ways to approach the assignment.

The program I am writing will be a filter for text files. The purpose of the assignment is to get experience with fstream and getline.

Requirements:

  1. Read one text file (any way possible, doesn't have to be all at once)
  2. Write to a separate text file (any way possible, can append or write character by character)
  3. It is assumed each sentence ends with a period.
  4. The first letter of each sentence must be uppercase.
  5. Everything besides the first letter of each sentence must be lowercase. (proper nouns too - it's a trivial example)

I have a working draft of a program that I have written, but getline has is inconsistent in how it is reading my text file. Basically, it will read one line in as a string, which is what I want. As it reads in the second line; however, the program throws a runtime error halfway through the line and Windows shuts it down.

Does getline have a buffer that fills up and needs to cleaned after each line read?

My pseudo code for the program:

  1. Use getline to read in string from line x stopping at a period (.).
  2. Iterate over strings characters, uppercasing the first letter, and then lowercasing the rest.
  3. Read in another string continuing after the last period (.) in the text file.
  4. Repeat until text file read.
  5. Write to second text file.

I am implementing getline this way:

getline(fileIN, str1, '.')

str1 is the string that is read from each line.

Am I using getline correctly? Am I thinking this problem through correctly and efficiently?

* I realized as I was finishing this extended question/section that getline may be utilizing more memory for '\r' or '\n' characters at the ends of lines, or for reasons unrelated to memory, getline is not correctly handling (according to my purposes) sentences that wrap to new lines. Does getline not handle sentence/word wrap well?

Also, is there a way to dynamically specify getline to read the first string up to a period (.) or newline ('\n'), which ever comes FIRST?

Thank you for your time and consideration.

Flabbergast answered 10/12, 2011 at 23:13 Comment(0)
A
8

Yes, you are using getline correctly. Make sure to use it as the condition for your while loop or other conditional:

while(std::getline(fileIN, str1, '.')) {
    // process str1
}

and don't make the mistake that so many others have made of trying to use fileIN.good() or !fileIN.eof() or whatever (this will only lead to headache and heartache).

The buffer of str1 does not need to be cleaned up by you since it is managed by the string class. It will expand as needed and deallocate itself when the variable goes out of scope. That is one of the reasons why we love the Standard Library classes and think twice before using raw arrays.

Also, there are no artificial limits placed on the capacity of a string. The only limiting factor is the available memory of the system, so with virtual memory, it can potentially be a little less than 4GB on a 32 bit system or a little less than 264 bytes on a 64 bit system.

Algol answered 10/12, 2011 at 23:18 Comment(2)
Sorry for the nit-picking, but doesn't std::string have a max_size() member returning the maximum size it can grow (I'm pretty sure std::vector has it). So there is actually a specified artificial limit, though it is usually just (size_t)-1 or something similar and therefore equal to the actual hardware limit you mentioned. But +1 of course.Truax
@ChristianRau yes, but I think that's just for convenience, and according to en.cppreference.com/w/cpp/string/basic_string/max_size, it is "the maximum number of elements the string is able to hold due to system or library implementation limitations, i.e. ​std::distance(begin(), end())​ for the largest string". If you think my answer needs to be improved because it is incorrect or ambiguous then please let me know. I think it has this because of wstring or something where characters could take up more than one byte and max_size is not just ~(size_t)0 - 1.Algol
A
1

The limit on getline (from the C library) is constrained by resources (i.e. memory), so you should not worry that much (unless you have a file with a single line of a billion bytes).

The STL C++ getline has similar limitations.

So you should probably not care that much, unless you want to make a bullet-proof program (and then you might care about other things, like I/O errors, disk full condition, ...).

Algometer answered 10/12, 2011 at 23:18 Comment(1)
The Linux getline is POSIX (and I even believe it existed before the POSIX 2008 standard defining it, GNU getline probably inspired POSIX 2008). And the original poster mentions C++ and seems to use it (in iostreams, not the POSIX one).Algometer

© 2022 - 2024 — McMap. All rights reserved.