text-extraction

7

Solved

Get the text from all elements with a nominated class as a flat array

I know we can use PHP DOM to parse HTML using PHP, but I have a specific requirement. I have an HTML content like below <p class="Heading1-P"> <span class="Heading1-H"...

php html dom html-parsing text-extraction

Indignity asked 21/8, 2013 at 4:55

3

Why is extracting tabular data from PDF files hard?

I have a general question regarding extracting text, precisely tabular data, from PDF files. How are PDF viewers able to read and display a table? And why can't we just get the necessary column inf...

parsing pdf text-extraction

Esposito asked 22/12, 2012 at 10:19

5

Is there a way to get all text from the rendered page with JS?

Is there an (unobtrusive, to the user) way to get all the text in a page with Javascript? I could get the HTML, parse it, remove all tags, etc, but I'm wondering if there's a way to get the text fr...

javascript text text-extraction

Hagai asked 7/6, 2010 at 3:57

8

Solved

Get numeric suffix from key starting with specific substring

I have an array and in that array I have an array key that looks like, show_me_160 this array key may change a little, so sometimes the page may load and the array key maybe show_me_120, I want to ...

php arrays substring key text-extraction

Decury asked 14/10, 2010 at 9:59

6

Solved

Split string strictly formatted as uppercase letter then numbers into two halves [duplicate]

I have several strings of the format AA11 AAAAAA1111111 AA1111111 I need to separate the alphabetic and numeric components of the string.

php string split text-extraction text-parsing

Monopteros asked 13/7, 2012 at 19:47

8

How to use the Amazon Textract with PDF files

I already can use the textract but with JPEG files. I would like to use it with PDF files. I have the code bellow: import boto3 # Document documentName = "Path to document in JPEG" # Read doc...

amazon-web-services ocr text-extraction amazon-textract

Jaclynjaco asked 25/11, 2019 at 18:46

3

Solved

Extract pdf text within bounding box directly into python

I'm trying to extract the text of a pdf within a given bounding rectangle. I understand there are tools for pdf scraping such as pdfminer, pypdf, and pdftotext. I've experimented with all 3, and so...

python pdf text-extraction pypdf pdfminer

Waltz asked 9/4, 2019 at 0:26

2

Solved

Extract hindi Text from a PDF file

I am working on a task to extract some information (in HINDI) from a pdf file and convert it into a data frame. I have tried many things and followed many articles, and answers on stack overflow as...

python pdf tesseract python-tesseract text-extraction

Loveinidleness asked 31/3, 2023 at 7:58

10

Solved

How to extract text from resonably sane HTML?

My question is sort of like this question but I have more constraints: I know the document's are reasonably sane they are very regular (they all came from the same source I want about 99% of the ...

c#html d text-extraction

Ancel asked 21/1, 2010 at 23:3

4

C# Extract text from PDF using PdfSharp

Is there a possibility to extract plain text from a PDF-File with PdfSharp? I don't want to use iTextSharp because of its license.

c#text text-extraction pdfsharp

Recall asked 13/4, 2012 at 12:48

2

Using Textract, how do you extract tables from a pdf file and output it into a csv file via .py script?

I want to use textract (via aws cli) to extract tables from a pdf file (located in an s3 location) and export it into a csv file. I have tried writing a .py script but am struggling to read from th...

python amazon-web-services text-extraction amazon-textract

Pocked asked 13/10, 2020 at 17:18

6

Solved

Extracting text from a PDF file using PDFMiner in python?

I am looking for documentation or examples on how to extract text from a PDF file using PDFMiner with Python. It looks like PDFMiner updated their API and all the relevant examples I have found co...

python python-3.x python-2.7 text-extraction pdfminer

Theotheobald asked 21/10, 2014 at 18:56

7

Solved

Get last whole number in a string

I need to isolate the latest occurring integer in a string containing multiple integers. How can I get 23 instead of 1 for $lastnum1? $text = "1 out of 23"; $lastnum1 = $this->getEval(...

php regex string integer text-extraction

Arabele asked 25/9, 2012 at 19:8

6

Solved

Body Text extraction from websites e.g. extract only article heading and text not all text in site

I am looking for algorithms that allow text extraction from websites. I do not mean "strip html", or any of the hundreds of libraries that allow this. So for example for a news article I would lik...

algorithm text web-scraping text-extraction

Sandpit asked 21/4, 2011 at 15:2

8

Solved

How to extract Heading tags in PHP from a string?

From a string that contains a lot of HTML, how can I extract all the text from <h1><h2>etc tags into a new variable? I would like to capture all of the text from these elements and sto...

php text-extraction domparser

Hatpin asked 14/1, 2010 at 14:31

6

Solved

Extract floating point numbers from a delimited string in PHP

I would like to convert a string of delimited dimension values into floating numbers. For example 152.15 x 12.34 x 11mm into 152.15, 12.34 and 11 and store in an array such that: $dim[0] = 152.15...

php regex floating-point text-parsing text-extraction

Pule asked 3/6, 2009 at 12:16

5

Solved

Extraction of text page by page from MS word docx file using python

I have a MS docx file and I need to extract text from it page-wise. I have tried with python-docx but it could extract the whole text but not pagewise. I have also converted my docx to pdf and th...

python python-3.x document extract text-extraction

Reitareiter asked 18/12, 2019 at 4:53

1

Solved

pdfplumber | Extract text from dynamic column layouts

Attempted Solution at bottom of post. I have near-working code that extracts the sentence containing a phrase, across multiple lines. However, some pages have columns. So respective outputs are inc...

python if-statement text-extraction information-extraction pdfplumber

Stroboscope asked 30/11, 2021 at 13:56

4

Solved

How to extract a table as text from the PDF

I have a PDF which contains Tables, text and some images. I want to extract the table wherever tables are there in the PDF. Right now am doing manually to find the Table from the page. From there I...

python pdf text-extraction pdf-parsing

Unsatisfactory asked 28/11, 2017 at 14:23

7

PDFminer: extract text with its font information [duplicate]

I find this question, but it uses command line, and I do not want to call a Python script in command line using subprocess and parse HTML files to get the font information. I want to use PDF...

python text-extraction pdfminer

Whatley asked 5/1, 2016 at 7:33

5

How to install textract in python3

sudo python3 -m pip install textract sudo apt-get install textract pip install textract sudo apt-get install swig I want to install textract in python3 but it is not install proper way, it gives ...

python-3.5 text-extraction

Twickenham asked 25/11, 2017 at 6:30

4

Capture src value of an <img> tag with regex

I want to grab an img tag from text returned from JSON data like that. I want to grab this from a string: <img class="img" src="https://fbcdn-photos-c-a.akamaihd.net/hphotos-ak-frc3/1239478_598...

regex image html-parsing text-extraction

Warton asked 6/9, 2013 at 19:15

23

Solved

Extract a single (unsigned) integer from a string

I want to extract the digits from a string that contains numbers and letters like: "In My Cart : 11 items" I want to extract the number 11.

php string integer text-extraction

Herminahermine asked 8/6, 2011 at 11:53

1

How to extract text from a two-column PDF using PDFPlumber

I am working on topic modeling tasks using python and I would like to extract texts from annual/sustainability reports. However my problem is, when I tried to extract the report, the extracted line...

python text-extraction topic-modeling information-extraction pdfplumber

Drove asked 25/8, 2021 at 8:4

8

Solved

Extract all email addresses from bulk text using jquery

I'm having the this text below: [email protected], "assdsdf" <[email protected]>, "rodnsdfald ferdfnson" <[email protected]>, "Affdmdol Gondfgale" <[email protec...

javascript jquery regex text-extraction email-address

Buchalter asked 21/1, 2013 at 14:11

text-extraction Questions

Recommended topics

Hot tags