text-processing Questions

5

Solved

Here's an example email header, header = """ From: Media Temple user ([email protected]) Subject: article: A sample header Date: January 25, 2011 3:30:58 PM PDT To: [email protected] Ret...
Boiling asked 14/5, 2015 at 14:38

4

Solved

Suppose we are doing a multiline regex pattern search on a bunch of files and we want to extract the matches from grep. By default, grep outputs matches separated by newlines, but since we are doin...
Vimen asked 17/3, 2016 at 16:33

9

In a Bash script, I want to pick out N random lines from input file and output to another file. How can this be done?
Seften asked 12/2, 2012 at 1:27

9

Following command outputs following lines of text on console git log --pretty=format:"%h;%ai;%s" --shortstat ed6e0ab;2014-01-07 16:32:39 +0530;Foo 3 files changed, 14 insertions(+), 13 deletions(...
Paralytic asked 15/1, 2014 at 12:28

6

Solved

Is there a python library which takes wikitext (as used in mediawiki) input and converts it to markdown?
Alexina asked 12/2, 2011 at 22:2

4

Solved

I've tried various methods to strip the license from Project Gutenberg texts, for use as a corpus for a language learning project, but I can't seem to come up with an unsupervised, reliable approac...
Medardas asked 12/8, 2009 at 22:48

10

Solved

For example, we have some file like that: first line second line third line And in result we have to get: first line second line third line Use ONLY python
Ignescent asked 3/3, 2010 at 7:29

7

Solved

Here's a website I found that will produce upside down versions of any English text. how does it work? does unicode have upside down chars? Or what? How can I write my own text flipping function?...
Pacify asked 8/6, 2010 at 7:3

26

Solved

I am constantly learning new tools, even old fashioned ones, because I like to use the right solution for the problem. Nevertheless, I wonder if there is still any reason to learn some of them. aw...
Unveil asked 20/9, 2008 at 8:20

4

Solved

I am following this document clustering tutorial. As an input I give a txt file which can be downloaded here. It's a combined file of 3 other txt files divided with a use of \n. After creating a tf...
Behistun asked 3/8, 2019 at 16:23

10

Solved

I have raw html with some css classes inside for various tags. Example: Input: <p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque mole...
Nevus asked 8/1, 2014 at 18:12

7

I was wondering if anyone was familiar with any attempts at algorithmic sentence negation. For example, given a sentence like "This book is good" provide any number of alternative sentences meanin...
Zoroastrianism asked 13/4, 2010 at 21:27

28

Solved

I have a ~23000 line SQL dump containing several databases worth of data. I need to extract a certain section of this file (i.e. the data for a single database) and place it in a new file. I know b...
Labana asked 17/9, 2008 at 13:40

25

Solved

I would like to update a large number of C++ source files with an extra include directive before any existing #includes. For this sort of task, I normally use a small bash script with sed to re-wri...
Cristal asked 29/9, 2008 at 12:22

12

I'm trying to make a txt file with a generated key into 1 line. example: <----- key start -----> lkdjasdjskdjaskdjasdkj skdhfjlkdfjlkdsfjsdlfk kldshfjlsdhjfksdhfksdj jdhsfkjsdhfksdjfhskdfh j...
Corybantic asked 18/5, 2011 at 20:42

19

Solved

I want to pipe the output of a "template" file into MySQL, the file having variables like ${dbName} interspersed. What is the command line utility to replace these instances and dump the ...
Monopolize asked 6/1, 2009 at 7:0

4

I want to thank you for helping me my related issue. I know if I do a cat /proc/meminfo it will only display in kB. How can I display in MB? I really want to use cat or awk for this please.
Luxembourg asked 23/4, 2015 at 0:16

18

Solved

I have a file as below: line1 line2 line3 And I want to get: prefixline1 prefixline2 prefixline3 I could write a Ruby script, but it is better if I do not need to. prefix will contain /. It ...
Saylor asked 20/1, 2010 at 6:36

7

Solved

I have the following list of words: name,id,3 I need to have it double quoted like this: "name,id,3" I have tried sed 's/.*/\"&\"/g' and got: "name,id,3 Which has only one double quote...
Herries asked 25/5, 2012 at 12:45

8

Solved

I have a file which contain following lines: /logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com /logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:ap...
Journeyman asked 30/8, 2012 at 19:35

6

Solved

I need to calculate BLEU score for identifying whether two sentences are similar or not.I have read some articles which are mostly about BLEU score for Measuring Machine translation accuracy.But i'...
Heliotropin asked 22/3, 2011 at 11:22

5

Solved

I've recently been working on some database search functionality and wanted to get some information like the average words per document (e.g. text field in the database). The only thing I have foun...
Balfore asked 14/4, 2009 at 16:1

2

I am working on extracting names of people from various ads appearing in English newspapers . However , i have noticed that I need to identify the boundary of an Ad , before extracting the names ...
Humic asked 19/11, 2013 at 11:4

5

Solved

I have this script that does a word search in text. The search goes pretty good and results work as expected. What I'm trying to achieve is extract n words close to the match. For example: The w...
Immaterialize asked 15/7, 2013 at 1:56

6

Solved

I need to remove one directory (the leftmost) from variables in Bash. I found ways how can I remove all the path or use dirname and others but it was removing all or one path component on the right...
Zwickau asked 15/3, 2011 at 12:50

© 2022 - 2025 — McMap. All rights reserved.