ORC, like AVRO and PARQUET, are format specifically designed for massive storage. You can think about them "like a csv", they are all files containing data, with their particular structure (different than csv, or a json of course!).
Using pyspark
should be easy reading an orc file, as soon as your environment grants the Hive support.
Answering your question, I'm not sure that in a local environment without Hive you will be able to read it, I've never done it (you can do a quick test with the following code):
Loads ORC files, returning the result as a DataFrame.
Note: Currently ORC support is only available together with Hive support.
>>> df = spark.read.orc('python/test_support/sql/orc_partitioned')
Hive is a data warehouse system, that allows you to query your data on HDFS (distributed file system) through Map-Reduce like a traditional relational database (creating queries SQL-like, doesn't support 100% all the standard SQL features!).
Edit: Try the following to create a new Spark Session. Not to be rude, but I suggest you to follow one of many PySpark tutorial in order to understand the basics of this "world". Everything will be much clearer.
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Test').getOrCreate()
with open(filename, 'rb') as file:
to avoid the decoding errorpyarrow.lib.ArrowIOError: Arrow error: IOError: 'utf-8' codec can't decode byte 0xfe in position 11: invalid start byte
. – Yaakov