Neo4j creating relationships using csv
Asked Answered
P

1

8

I am trying to create relationships between 2 types of nodes using csv file loaded. I have already created all Movies and Keywords nodes. I created also indexes on :Movie(title) and :Keyword(word).

My csv file looks like:

"title"|year|"word" //header

"Into the Wild"|2007|"1990s" //line with title, year and keyword

"Into the Wild"|2007|"abandoned-bus"

My query:

LOAD CSV WITH HEADERS FROM "file:/home/gondil/temp.csv" AS csv
FIELDTERMINATOR '|'
MATCH (m:Movie {title:csv.title,year: toInt(csv.year)}), (k:Keyword {word:csv.word})
MERGE (m)-[:Has {weight:1}]->(k);

Query runs for about one hour and than it shows error "Unknown error". What a redundant Error description.

I thought it is due to 160K keywords and over 1M movies and over 4M lines in csv. So I shorten a csv to just one line and it is still running for about 15 minutes with no stop.

Where is the problem? How to write a query for creating relationships between 2 already created nodes?

I can also delete all nodes and build my database other way but it will be better to not delete all that created nodes.

Note: I shouldn't have a hardware problems cause I use Super PC from our faculty.

Paddock answered 28/10, 2014 at 9:43 Comment(1)
"redundant" I don't think it means what you think it means.Brindisi
L
6

Be sure to have schema indexes in place to speed up looking up start nodes. Before running the import do a:

CREATE INDEX ON :Movie(title)
CREATE INDEX ON :Keyword(word)

Make sure the indexes are populated and online (check with :schema command).

Refactor your Cypher command into two queries, to make use of the indexes - for now a index consists only of a label and one property:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/home/gondil/temp.csv" AS csv
FIELDTERMINATOR '|'
MERGE (m:Movie {title:csv.title })
ON CREATE SET m.year = toInt(csv.year)
MERGE (k:Keyword {word:csv.word})

second pass over the file

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/home/gondil/temp.csv" AS csv
FIELDTERMINATOR '|'
MATCH (m:Movie {title:csv.title })
MATCH (k:Keyword {word:csv.word})
MERGE (m)-[:Has {weight:1}]->(k);
Lepanto answered 28/10, 2014 at 9:52 Comment(4)
I executed :schema. It returned Indexes ON :Keyword(word) ONLINE and ON :Movies(title) ONLINE. Than I executed your suggested query on csv with just 2 lines and it is running for 15 min+ yet. Can't figure out what is wrong. I tested just to return nodes detected by csv file and it takes about 118ms.Paddock
Now I'm confused. Browser interface of Neo says I have :Movies and :Keywords nodes but what is strange, it shows that I have also :Has rels. So some must be created. I tried some very primitive queries such return me a single node. It taken about 30 seconds but it should take few ms. Some of that queries execution last really long and than an Unknown error occurs. Ex.: I tried to make one single relationship not from csv just merge 2 nodes and it haven't been done. I'm really desperate of it. Do you know what could be wrong with it?I would like to write you some message not off-topic commentPaddock
Hello I returned to this after long time, I deleted everything in my database and then start first using periodic commit query you posted. I let it run and go out of my laptop but forgot to set not to sleep after some time. When I come back there was Unknown error but when I looked to webadmin interface there was some nodes and what was strange count was increasing. And is still increasing. Now when I execute query to count nodes and execute it in few minutes later count increase. My database is running on remote PC. why it showed me error? How can I know what's going on?Paddock
now it is running about 21 hours and there are only 74K nodes created. There is no information about running query but on webadmin interface is count of properties and nodes still increasing veeery slowly. I compute some basic equation and it seems that to finish it I'll need to wait 19 days. And it's only 1.2M nodes. How can I stop this madness? How can I run it to successfull and relatively fast end?Paddock

© 2022 - 2024 — McMap. All rights reserved.