character-properties Questions
6
Solved
In different encodings of Unicode, for example UTF-16le or UTF-8, a character may occupy 2 or 3 bytes. Many Unicode applications doesn't take care of display width of Unicode chars just like they a...
Bayern asked 3/9, 2010 at 9:54
5
Solved
Is there a way to get the Unicode Block of a character in python? The unicodedata module doesn't seem to have what I need, and I couldn't find an external library for it.
Basically, I need the sam...
Arber asked 28/10, 2008 at 15:56
11
Solved
How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have
[A-Za-z]
Darvon asked 11/11, 2009 at 17:1
9
Solved
I need to take a string, and shorten it to 140 characters.
Currently I am doing:
if len(tweet) > 140:
tweet = re.sub(r"\s+", " ", tweet) #normalize space
footer = "… " + utils.shorten_urls(p...
Stagnant asked 15/11, 2009 at 20:53
5
Solved
I'm trying to write a reasonably permissive validator for names in PHP, and my first attempt consists of the following pattern:
// unicode letters, apostrophe, hyphen, space
$namePattern = "/^([\\...
Dedrick asked 13/2, 2011 at 9:17
2
Solved
In .net you can use \p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.
Everara asked 11/6, 2011 at 7:5
11
There should be something akin to \w that can match any code-point in Letters or Marks category (not just the ASCII ones), and hopefully have filters like [[P*]] for punctuation, etc.
Slattern asked 11/11, 2008 at 12:0
11
There should be something akin to \w that can match any code-point in Letters or Marks category (not just the ASCII ones), and hopefully have filters like [[P*]] for punctuation, etc.
Survive asked 11/11, 2008 at 12:0
5
Solved
I have a multilingual website (Chinese and English).
I like to validate a text field (name field) in javascript. I have the following code so far.
var chkName = /^[characters]{1,20}$/;
if( chkN...
Breechloader asked 16/6, 2011 at 19:25
7
Solved
Okay, I have read about regex all day now, and still don't understand it properly. What i'm trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z],...
Detour asked 11/5, 2011 at 11:8
3
Solved
According to the Oniguruma documentation, the \d character type matches:
decimal digit char
Unicode: General_Category -- Decimal_Number
However, scanning for \d in a string with all the Decim...
Whichever asked 9/8, 2011 at 15:28
4
I'd like to match all strings containing a certain word. like:
String regex = (?:\P{L}|\W|^)(ベスパ)(?:\b|$)
however, the Pattern class doesn't compile it:
java.util.regex.PatternSyntaxException:...
Dichromic asked 12/4, 2011 at 21:14
3
Solved
there are some similar questions out there, but none that are quite the same or that have an answer that works for me.
I need a javascript function which validates whether a text field contains al...
Bikini asked 3/4, 2013 at 10:59
4
I need to split a string with "-" as delimiter in java.
Ex: "Single Room - Enjoy your stay"
I have the same data coming in english and german depending on locale . Hence I cannot use the usual st...
German asked 8/3, 2012 at 4:25
1
Solved
Regular expression engines have a concept of "zero width" matches, some of which are useful for finding edges of words:
\b - present in most engines to match any boundary between word and non-wor...
Grose asked 11/5, 2013 at 1:39
2
Solved
I am using listadmin to manage many mailman-based mailing lists. I have a long list of subjects and from addresses set up to block spam. Recently, I received smarter spam in the sense that it uses ...
Kelso asked 9/5, 2013 at 20:17
3
Solved
Does Perl's \w match all alphanumeric characters defined in the Unicode standard?
For example, will \w match all (say) Chinese and Russian alphanumeric characters?
I wrote a simple test script (s...
Volturno asked 5/4, 2011 at 17:4
1
Often one wants to list all characters in a given Unicode category. For example:
List all Unicode whitespace, How can I get all whitespaces in UTF-8 in Python?
Characters with the property Alphab...
Northeaster asked 9/1, 2013 at 20:30
3
Solved
I have read thru the other questions at Stackoverflow, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work.
>>> import re
>>&g...
Bodine asked 17/2, 2011 at 12:8
3
Solved
What is the right way to match a C# identifier, specifically a property or field name, using .Net Regex patterns?
Background. I used to use the ASCII centric @"[_a-zA-Z][_a-zA-Z0-9]*" But now unic...
Beekeeping asked 9/12, 2010 at 16:8
3
I have a file, file1.txt, containing text in English, Chinese, Japanese, and Korean. For use in ConTeXt, I need to mark each region of text within the file according to language, except for English...
Townswoman asked 7/5, 2012 at 13:23
2
Solved
I feel lost with the Regex Unicode Properties presented by RegexBuddy, I cannot distinguish between any of the Number properties and the Math symbol property only seems to match + but not -, *, /, ...
Middleclass asked 14/1, 2010 at 6:17
6
Solved
Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use \p{Ll} to match an arbitrary lower-case letter, or p{Zs} for any spa...
Sinclair asked 2/12, 2009 at 13:25
1
Solved
I have a string from which i want to extract 3 groups:
'19 janvier 2012' -> '19', 'janvier', '2012'
Month name could contain non ASCII characters, so [A-Za-z] does not work for me:
>>&...
Stilbestrol asked 19/1, 2012 at 9:49
2
Solved
I've got a series of Unicode codepoints. What I really need to do is iterate through these codepoints as a series of characters, not a series of codepoints, and determine properties of each individ...
Stile asked 26/11, 2011 at 22:5
1 Next >
© 2022 - 2024 — McMap. All rights reserved.