loading rdf triples into virtuoso opensource
Asked Answered
G

1

7

I'm trying to create a local mirror of LinkedGeoData.org from this dump.

That's around 61,000,000 triples. Virtuoso is supposed to easily handle a lot more than that, but every single time it stops loading after around 40,000,000 triples. I'm using a double extra large instance from Amazon EC2 which has 30 GB of RAM, with plenty of storage space left too. Is there something wrong with my config file? I'm using ubuntu server 12.04, and I've tried installing Virtuoso through apt-get (version 6.1.5) and compiling from the latest stable source from github (version 6.1.6) following Jörn Hees' instructions.

I've also tried splitting the dumpfile into smaller pieces and loading them one by one. This also breaks down after around 40,000,000 triples have been inserted.

The logfile doesn't show anything strange; virtuoso-t just stops working without actually crashing, and top shows the process using 0% of the CPU. I've left the process running for several days without any progress after the first half hour or so.

Here is my virtuoso.ini file:

[Database]
DatabaseFile            = /var/lib/virtuoso/db/virtuoso.db
ErrorLogFile            = /var/lib/virtuoso/db/virtuoso.log
LockFile            = /var/lib/virtuoso/db/virtuoso.lck
TransactionFile         = /var/lib/virtuoso/db/virtuoso.trx
xa_persistent_file      = /var/lib/virtuoso/db/virtuoso.pxa
ErrorLogLevel           = 7
FileExtend          = 200
MaxCheckpointRemap      = 625000
Striping            = 0
TempStorage         = TempDatabase


[TempDatabase]
DatabaseFile            = /var/lib/virtuoso/db/virtuoso-temp.db
TransactionFile         = /var/lib/virtuoso/db/virtuoso-temp.trx
MaxCheckpointRemap      = 2000
Striping            = 0


;
;  Server parameters
;
[Parameters]
ServerPort          = 1111
LiteMode            = 0
DisableUnixSocket       = 1
DisableTcpSocket        = 0
;SSLServerPort          = 2111
;SSLCertificate         = cert.pem
;SSLPrivateKey          = pk.pem
;X509ClientVerify       = 0
;X509ClientVerifyDepth      = 0
;X509ClientVerifyCAFile     = ca.pem
ServerThreads           = 20
CheckpointInterval      = 60
O_DIRECT            = 0
CaseMode            = 2
MaxStaticCursorRows     = 5000
CheckpointAuditTrail        = 0
AllowOSCalls            = 0
SchedulerInterval       = 10
DirsAllowed         = ., /usr/share/virtuoso/vad, /home/ubuntu/lgd
ThreadCleanupInterval       = 0
ThreadThreshold         = 10
ResourcesCleanupInterval    = 0
FreeTextBatchSize       = 100000
SingleCPU           = 0
VADInstallDir           = /usr/share/virtuoso/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize        = 100
IndexTreeMaps           = 256
MaxMemPoolSize                  = 200000000
PrefixResultNames               = 0
MacSpotlight                    = 0
IndexTreeMaps                   = 64
;;
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;;
;; Uncomment next two lines if there is 2 GB system memory free
;       NumberOfBuffers          = 170000
;       MaxDirtyBuffers          = 130000
;; Uncomment next two lines if there is 4 GB system memory free
;       NumberOfBuffers          = 340000
;       MaxDirtyBuffers          = 250000
;; Uncomment next two lines if there is 8 GB system memory free
;       NumberOfBuffers          = 680000
;       MaxDirtyBuffers          = 500000
;; Uncomment next two lines if there is 16 GB system memory free
;       NumberOfBuffers          = 1360000
;       MaxDirtyBuffers          = 1000000
;; Uncomment next two lines if there is 32 GB system memory free
       NumberOfBuffers          = 2720000
       MaxDirtyBuffers          = 2000000
;; Uncomment next two lines if there is 48 GB system memory free
;       NumberOfBuffers          = 4000000
;       MaxDirtyBuffers          = 3000000
;; Uncomment next two lines if there is 64 GB system memory free
;       NumberOfBuffers          = 5450000
;       MaxDirtyBuffers          = 4000000
;;
;; Note the default settings will take very little memory
;; but will not result in very good performance
;;


[HTTPServer]
ServerPort          = 8890
ServerRoot          = /var/lib/virtuoso/vsp
ServerThreads           = 20
DavRoot             = DAV
EnabledDavVSP           = 0
HTTPProxyEnabled        = 0
TempASPXDir         = 0
DefaultMailServer       = localhost:25
ServerThreads           = 10
MaxKeepAlives           = 10
KeepAliveTimeout        = 10
MaxCachedProxyConnections   = 10
ProxyConnectionCacheTimeout = 15
HTTPThreadSize          = 280000
HttpPrintWarningsInOutput   = 0
Charset             = UTF-8
;HTTPLogFile                = logs/http.log

[AutoRepair]
BadParentLinks          = 0

[Client]
SQL_PREFETCH_ROWS       = 100
SQL_PREFETCH_BYTES      = 16000
SQL_QUERY_TIMEOUT       = 0
SQL_TXN_TIMEOUT         = 0
;SQL_NO_CHAR_C_ESCAPE       = 1
;SQL_UTF8_EXECS         = 0
;SQL_NO_SYSTEM_TABLES       = 0
;SQL_BINARY_TIMESTAMP       = 1
;SQL_ENCRYPTION_ON_PASSWORD = -1

[VDB]
ArrayOptimization       = 0
NumArrayParameters      = 10
VDBDisconnectTimeout        = 1000
KeepConnectionOnFixedThread = 0

[Replication]
ServerName          = db-IP-10-252-61-61
ServerEnable            = 1
QueueMax            = 50000


;
;  Striping setup
;
;  These parameters have only effect when Striping is set to 1 in the
;  [Database] section, in which case the DatabaseFile parameter is ignored.
;
;  With striping, the database is spawned across multiple segments
;  where each segment can have multiple stripes.
;
;  Format of the lines below:
;    Segment<number> = <size>, <stripe file name> [, <stripe file name> .. ]
;
;  <number> must be ordered from 1 up.
;
;  The <size> is the total size of the segment which is equally divided
;  across all stripes forming  the segment. Its specification can be in
;  gigabytes (g), megabytes (m), kilobytes (k) or in database blocks
;  (b, the default)
;
;  Note that the segment size must be a multiple of the database page size
;  which is currently 8k. Also, the segment size must be divisible by the
;  number of stripe files forming  the segment.
;
;  The example below creates a 200 meg database striped on two segments
;  with two stripes of 50 meg and one of 100 meg.
;
;  You can always add more segments to the configuration, but once
;  added, do not change the setup.
;
[Striping]
Segment1            = 100M, db-seg1-1.db, db-seg1-2.db
Segment2            = 100M, db-seg2-1.db
;...

;[TempStriping]
;Segment1           = 100M, db-seg1-1.db, db-seg1-2.db
;Segment2           = 100M, db-seg2-1.db
;...

;[Ucms]
;UcmPath            = <path>
;Ucm1               = <file>
;Ucm2               = <file>
;...


[Zero Config]
ServerName          = virtuoso (IP-10-252-61-61)
;ServerDSN          = ZDSN
;SSLServerName          = 
;SSLServerDSN           = 


[Mono]
;MONO_TRACE         = Off
;MONO_PATH          = <path_here>
;MONO_ROOT          = <path_here>
;MONO_CFG_DIR           = <path_here>
;virtclr.dll            =


[URIQA]
DynamicLocal            = 0
DefaultHost         = localhost:8890


[SPARQL]
;ExternalQuerySource        = 1
;ExternalXsltSource         = 1
;DefaultGraph           = http://localhost:8890/dataspace
;ImmutableGraphs            = http://localhost:8890/dataspace
ResultSetMaxRows            = 10000
MaxQueryCostEstimationTime  = 4000  ; in seconds
MaxQueryExecutionTime       = 600   ; in seconds
DefaultQuery                = select distinct ?Concept where {[] a ?Concept} LIMIT 100
DeferInferenceRulesInit     = 0  ; controls inference rules loading
;PingService            = http://rpc.pingthesemanticweb.com/
ShortenLongURIs = 1

[Plugins]
LoadPath            = /usr/lib/virtuoso/hosting
Load1               = plain, wikiv
Load2               = plain, mediawiki
Load3               = plain, creolewiki
Load4           = plain, im

Any help is greatly appreciated.

Gamecock answered 14/8, 2012 at 4:59 Comment(2)
For benefit of future readers... Jörn has updated his guide a few times. Latest is dated 2015-11-23, and is based on Virtuoso 7.2.1 and DBpedia 2015.Hu
Also note that Virtuoso-specific questions are often answered more quickly via product-specific resources such as the Virtuoso Users mailing list, the public OpenLink Support Forums, or a confidential OpenLink Support Case. ObDisclaimer: I work for OpenLink Software, producer of Virtuoso.Hu
G
4

Answering my own question. The problem were the leading spaces in the lines

   NumberOfBuffers          = 2720000
   MaxDirtyBuffers          = 2000000

Deleting those, Virtuoso actually used the available memory instead of the default 16MB.

Gamecock answered 18/8, 2012 at 1:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.