character-properties Questions

6

Solved

In different encodings of Unicode, for example UTF-16le or UTF-8, a character may occupy 2 or 3 bytes. Many Unicode applications doesn't take care of display width of Unicode chars just like they a...
Bayern asked 3/9, 2010 at 9:54

5

Solved

Is there a way to get the Unicode Block of a character in python? The unicodedata module doesn't seem to have what I need, and I couldn't find an external library for it. Basically, I need the sam...
Arber asked 28/10, 2008 at 15:56

11

Solved

How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have [A-Za-z]
Darvon asked 11/11, 2009 at 17:1

9

Solved

I need to take a string, and shorten it to 140 characters. Currently I am doing: if len(tweet) > 140: tweet = re.sub(r"\s+", " ", tweet) #normalize space footer = "… " + utils.shorten_urls(p...
Stagnant asked 15/11, 2009 at 20:53

5

Solved

I'm trying to write a reasonably permissive validator for names in PHP, and my first attempt consists of the following pattern: // unicode letters, apostrophe, hyphen, space $namePattern = "/^([\\...
Dedrick asked 13/2, 2011 at 9:17

2

Solved

In .net you can use \p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.
Everara asked 11/6, 2011 at 7:5

11

There should be something akin to \w that can match any code-point in Letters or Marks category (not just the ASCII ones), and hopefully have filters like [[P*]] for punctuation, etc.
Slattern asked 11/11, 2008 at 12:0

11

There should be something akin to \w that can match any code-point in Letters or Marks category (not just the ASCII ones), and hopefully have filters like [[P*]] for punctuation, etc.
Survive asked 11/11, 2008 at 12:0

5

Solved

I have a multilingual website (Chinese and English). I like to validate a text field (name field) in javascript. I have the following code so far. var chkName = /^[characters]{1,20}$/; if( chkN...
Breechloader asked 16/6, 2011 at 19:25

7

Solved

Okay, I have read about regex all day now, and still don't understand it properly. What i'm trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z],...
Detour asked 11/5, 2011 at 11:8

3

Solved

According to the Oniguruma documentation, the \d character type matches: decimal digit char Unicode: General_Category -- Decimal_Number However, scanning for \d in a string with all the Decim...
Whichever asked 9/8, 2011 at 15:28

4

I'd like to match all strings containing a certain word. like: String regex = (?:\P{L}|\W|^)(ベスパ)(?:\b|$) however, the Pattern class doesn't compile it: java.util.regex.PatternSyntaxException:...
Dichromic asked 12/4, 2011 at 21:14

3

Solved

there are some similar questions out there, but none that are quite the same or that have an answer that works for me. I need a javascript function which validates whether a text field contains al...
Bikini asked 3/4, 2013 at 10:59

4

I need to split a string with "-" as delimiter in java. Ex: "Single Room - Enjoy your stay" I have the same data coming in english and german depending on locale . Hence I cannot use the usual st...
German asked 8/3, 2012 at 4:25

1

Solved

Regular expression engines have a concept of "zero width" matches, some of which are useful for finding edges of words: \b - present in most engines to match any boundary between word and non-wor...

2

Solved

I am using listadmin to manage many mailman-based mailing lists. I have a long list of subjects and from addresses set up to block spam. Recently, I received smarter spam in the sense that it uses ...
Kelso asked 9/5, 2013 at 20:17

3

Solved

Does Perl's \w match all alphanumeric characters defined in the Unicode standard? For example, will \w match all (say) Chinese and Russian alphanumeric characters? I wrote a simple test script (s...

1

Often one wants to list all characters in a given Unicode category. For example: List all Unicode whitespace, How can I get all whitespaces in UTF-8 in Python? Characters with the property Alphab...
Northeaster asked 9/1, 2013 at 20:30

3

Solved

I have read thru the other questions at Stackoverflow, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work. >>> import re >>&g...

3

Solved

What is the right way to match a C# identifier, specifically a property or field name, using .Net Regex patterns? Background. I used to use the ASCII centric @"[_a-zA-Z][_a-zA-Z0-9]*" But now unic...
Beekeeping asked 9/12, 2010 at 16:8

3

I have a file, file1.txt, containing text in English, Chinese, Japanese, and Korean. For use in ConTeXt, I need to mark each region of text within the file according to language, except for English...
Townswoman asked 7/5, 2012 at 13:23

2

Solved

I feel lost with the Regex Unicode Properties presented by RegexBuddy, I cannot distinguish between any of the Number properties and the Math symbol property only seems to match + but not -, *, /, ...
Middleclass asked 14/1, 2010 at 6:17

6

Solved

Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use \p{Ll} to match an arbitrary lower-case letter, or p{Zs} for any spa...
Sinclair asked 2/12, 2009 at 13:25

1

Solved

I have a string from which i want to extract 3 groups: '19 janvier 2012' -> '19', 'janvier', '2012' Month name could contain non ASCII characters, so [A-Za-z] does not work for me: >>&...
Stilbestrol asked 19/1, 2012 at 9:49

2

Solved

I've got a series of Unicode codepoints. What I really need to do is iterate through these codepoints as a series of characters, not a series of codepoints, and determine properties of each individ...
Stile asked 26/11, 2011 at 22:5

© 2022 - 2024 — McMap. All rights reserved.