Ruby: Titleize: How do I ignore smaller words like 'and', 'the', 'or, etc
Asked Answered
S

10

12
def titleize(string)
  string.split(" ").map {|word| word.capitalize}.join(" ")
end

This titleizes every single word, but how do I capture certain words I don't want capitalized?

ie) Jack and Jill

And please DO NOT USE Regex.

UPDATE:

I am having trouble making this code work: I got it to print an array of words all caps, but not without the list below.

words_no_cap = ["and", "or", "the", "over", "to", "the", "a", "but"]

def titleize(string)
cap_word = string.split(" ").map {|word| word.capitalize}

cap_word.include?(words_no_cap)

end
Sclerotic answered 26/2, 2013 at 0:4 Comment(3)
You create what is called a "stop-word" list, containing words you don't want to process. It can be an array, a set, a hash, or a regex. Regex is a common way to handle the problem as is a hash, as they're both very fast. Also, don't refuse answers saying "DO NOT USE...". We supply what we think are the best solutions to a problem, and you are free to not implement them.Lydialydian
possible duplicate of Using title case with Ruby 1.8.7Cavicorn
“DO NOT USE Regex” sounds controversial within the example code given, since string.split does in fact use the regex. You likely want smth already prepared for use—then take a look at titleize. It may be installed as gem.Edgewise
P
9

You probably want to create an extension to the existing titleize function that Rails provides.

To do so, just include the following file in an initializer, and presto! Supply exceptions on the fly or optionally modify my example to add defaults into the initializer.

I realize that you didn't want to use Regex, but hey, the actual rails function uses Regex so you might as well keep it in sync.

Put this file in Rails.root/lib/string_extension.rb and load it in an initializer; or just throw it in the initializer itself.

UPDATE: modified the REGEX on this thanks to @svoop's suggestion for adding the ending word boundary.

# encoding: utf-8
class String
  def titleize(options = {})
    exclusions = options[:exclude]

    return ActiveSupport::Inflector.titleize(self) unless exclusions.present?
    self.underscore.humanize.gsub(/\b(?<!['’`])(?!(#{exclusions.join('|')})\b)[a-z]/) { $&.capitalize }
  end
end
Promenade answered 25/7, 2013 at 13:44 Comment(0)
T
5

If you want not to capitalize and or the, just do the following:

def titleize(string)
  nocaps = "and"
  string.split(" ").map { |word| nocaps.include?(word) ? word : word.capitalize }.join(" ")
end
Tamalatamale answered 6/1, 2015 at 5:28 Comment(0)
W
3

Here is my little code. You can refractor it into a few lines.

def titleize(str)
    str.capitalize!  # capitalize the first word in case it is part of the no words array
    words_no_cap = ["and", "or", "the", "over", "to", "the", "a", "but"]
    phrase = str.split(" ").map {|word| 
        if words_no_cap.include?(word) 
            word
        else
            word.capitalize
        end
    }.join(" ") # I replaced the "end" in "end.join(" ") with "}" because it wasn't working in Ruby 2.1.1
  phrase  # returns the phrase with all the excluded words
end
Wineskin answered 29/10, 2013 at 6:8 Comment(0)
Q
3

The answer of @codenamev is not quite doing the job:

EXCLUSIONS = %w(a the and or to)
"and the answer is all good".titleize(exclude: EXCLUSIONS)
# => "And the Answer Is all Good"
                        ^^^

Exclusions should match trailing word boundaries. Here's an improved version:

# encoding: utf-8
class String
  def titleize(options = {})
    exclusions = options[:exclude]

    return ActiveSupport::Inflector.titleize(self) unless exclusions.present?
    self.underscore.humanize.gsub(/\b(['’`]?(?!(#{exclusions.join('|')})\b)[a-z])/) { $&.capitalize }
  end
end

"and the answer is all good".titleize(exclude: EXCLUSIONS)
# => "And the Answer Is All Good"
                        ^^^
Query answered 27/12, 2013 at 10:50 Comment(0)
S
3

If you throw this into config/initializers into a new file (you can name it anything like string.rb), you can call your custom functions to any string. Make sure you restart, and then you will be able to run below like ex) "anystring".uncapitalize_puncs

This is easier than messing around trying to change the default code of titleize. So now, you can just call @something.title.titleize.uncapitalize_puncs

class String

    def uncapitalize_puncs
        puncs = ["and", "the", "to", "of", "by", "from", "or"]
        array = self.split(" ")
        array.map! do |x| 
            if puncs.include? x.downcase
                x.downcase
            else
                x
            end
        end
        return array.join(" ")
    end

end
Skippie answered 20/7, 2014 at 5:43 Comment(0)
L
2

Some titles have edge cases (pun intended) that you might need to consider.

For example, small words at the start of a title or after punctuation often should be capitalized (e.g. "The Chronicles of Narnia: The Lion, the Witch and the Wardrobe" which has both).

One may also want/need to force small words to lower-case, so that input like "Jack And Jill" gets rendered to "Jack and Jill".

Sometimes you may also need to detect when a word (typically brand names) must retain unusual capitalization e.g. "iPod", or acronyms e.g. "NATO", or domain names, "example.com".

To properly handle such cases, the titleize gem is your friend, or should at least supply the basis for a complete solution.

Leftward answered 11/10, 2016 at 0:37 Comment(0)
J
0

This is pretty straightforward, just add a condition when you call captalize:

$nocaps = ['Jack', 'Jill']

def titleize(string)
  string.split(" ").map {|word| word.capitalize unless $nocaps.include?(word)}.join(" ")
end

The global variables are contrived for this example, it would probably be an instance variable in your real application.

Jamilla answered 26/2, 2013 at 0:9 Comment(2)
if not should be unless. But this will slow down the longer the nocaps list gets because include? has to walk the array sequentially.Lydialydian
@theTinMan A good point, I tend to not use unless though, even in this situation. Someone apparently doesn't like this solution anyway; but they didn't leave a comment.Jamilla
T
0
titleize("the matrix or titanic")

def titleize(string)
  no_cap = ["and", "or", "the", "over", "to", "the", "a", "but"]
  string.split(" ").map { |word| no_cap.include?(word) ? word : 
  word.capitalize }.join(" ")
end

result:

"the Matrix or Titanic"
Truth answered 3/1, 2018 at 15:23 Comment(0)
Y
0

I think the first word of phrase must be capitalized:

class String

  def uncapitalize_puncs
    puncs = ["and", "the", "to", "of", "by", "from", "or"]
    array = self.split(" ")
    i = -1
    array.map! do |x|
      i += 1
      if puncs.include?(x.downcase) && i > 0
        x.downcase
      else
        x
      end
    end
    array.join(" ")
  end

end
Yenta answered 21/12, 2021 at 11:40 Comment(0)
R
0

Splitting in spaces is a lazy idea. Words can be separated by anything but the own word components. Shorter than 4 chars shouldn't be capitalised as some esoteric list of English words (this all depends on the language). You must also consider the first word, which has always to be capitalised. So my final try:

def CapitalizeEachWord(orig)
    # Still incomplete
    # “a,” “an,” and “the.
    # Any word with fewer than four letters should remain in lowercase
    # “and,” “but,” and “for “at,” “by,” “to,” and “from, except first or last
    # https://www.grammarly.com/blog/capitalization-in-the-titles/
    $notToCap = [
        "either", "neither", "before", "above", "below",
        "down", "from", "into", "near", "onto", "over",
        "past", "upon", "with", "than", "that", "till",
        "when", "once",
    ]
    f = orig.dup
    pos = 0
    nwords = orig.scan(/(?u)(\w+)/).length
    f.gsub!(/(?u)(\w+)/) do |w|
        pos += 1
        if pos == 1 or pos == nwords or (
                w.length > 3 and not $notToCap.include? w.downcase
            )
            w.capitalize
        else
            w.downcase
        end
    end
    return f
end
Rightminded answered 20/8 at 11:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.