Importing a CSV file into a sqlite3 database table using Python

Go To StackoverFlow.com

82

I have a CSV file and I want to bulk-import this file into my sqlite3 database using Python. the command is ".import .....". but it seems that it cannot work like this. Can anyone give me an example of how to do it in sqlite3? I am using windows just in case. Thanks

2010-05-22 11:25
by Hossein
Please provide the actual command that didn't work and the actual error message. "import...." could be anything. "cannot work" is too vague for us to guess at. Without details, we can't help - S.Lott 2010-05-22 11:30
the actual command as I said is ".import" and it says syntax error new ".import - Hossein 2010-05-22 11:36
Please actually post the actual command in the question. Please actually post the actual error message in the question. Please do not add comments that simply repeat things. Please update the question with actual copy and paste of what you're actually doing - S.Lott 2010-05-22 11:46


111

import csv, sqlite3

con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("CREATE TABLE t (col1, col2);") # use your column names here

with open('data.csv','rb') as fin: # `with` statement available in 2.5+
    # csv.DictReader uses first line in file for column headings by default
    dr = csv.DictReader(fin) # comma is default delimiter
    to_db = [(i['col1'], i['col2']) for i in dr]

cur.executemany("INSERT INTO t (col1, col2) VALUES (?, ?);", to_db)
con.commit()
con.close()
2010-05-22 12:20
by bernie
In case you had the same problems I did: Make sure to change col1 and col2 to the column headers in the csv file. And close the connection to the database by calling con.close() at the end - Jonas 2016-08-08 21:31
Thanks, @Jonas. Updated post - bernie 2016-08-08 21:38
I keep getting not all arguments converted during string formatting when I attempt this method - Whitecat 2016-09-01 23:27
I tried this method, but it doesn't work for me. Could you check out my datasets here (they are very normal, except some columns have empty values) and try importing them with your code? https://stackoverflow.com/questions/46042623/two-questions-on-sqlite3-and-python-with-input-as-csv-files?noredirect=1#comment79045812_4604262 - user177196 2017-09-04 20:41
having 20MB csv file, script breaks in halfwa - holms 2018-04-30 14:51
I'm using your method to read the csv file (in python 3.6) but I keep getting a KeyError for the first column. Any ideas - Isaac 2019-01-06 22:07
If you want to save the created db on disk, first create an empty file, then give its path in place of :memory: - LoMaPh 2019-02-07 22:59


61

Creating an sqlite connection to a file on disk is left as an exercise for the reader ... but there is now a two-liner made possible by the pandas library

df = pandas.read_csv(csvfile)
df.to_sql(table_name, conn, if_exists='append', index=False)
2015-03-02 04:14
by Tennessee Leeuwenburg
thank you. I got an issue with panda. my csv is delimited by ';' and have ',' in entries. panda gives error on read_csv. any setting to read entries with commas w/out temporarily replace - Alexei Martianov 2016-06-19 06:42
use sep=';'. The pandas documentation clearly outlines how to deal with this - Tennessee Leeuwenburg 2016-06-20 08:01
is there a way to use pandas but without using the RAM?, i have a huge .csv (7gb) i can't import as a dataframe and then appended to the DB - Pablo 2016-11-18 12:57
Yes, there's a method in pandas which will read in chunks rather than all at once. I'm afraid I can't recall exactly off the top of my head. I think you add chunksize=, and then you get back an iterator which you can then use to append to a database piecewise. Let me know if you have trouble finding it and I can dig out a recipe - Tennessee Leeuwenburg 2016-12-04 07:00


11

The .import command is a feature of the sqlite3 command-line tool. To do it in Python, you should simply load the data using whatever facilities Python has, such as the csv module, and inserting the data as per usual.

This way, you also have control over what types are inserted, rather than relying on sqlite3's seemingly undocumented behaviour.

2010-05-22 11:34
by Marcelo Cantos
There is no need to prepare the insert. The source of SQL statements and compiled results are kept in a cache - John Machin 2010-06-17 04:13
@John Machin: Is there a link to how SQLite does this - Marcelo Cantos 2010-06-17 07:34
@Marcelo: If you are interested in HOW it's done (why?), look in the sqlite source or ask on the sqlite mailing list - John Machin 2010-06-17 07:54
@John Machin: I'm interested because in all the SQLite documentation that I've come across, there is not a single word about automatic caching of unprepared statements. I don't think it is reasonable to have to read source code or probe mailing lists to discover something as basic as whether I should prepare my SQL statements or not. What is your source of information on this - Marcelo Cantos 2010-06-17 08:40
@Marcelo: Actually it's done in the Python sqlite3 wrapper module. http://docs.python.org/library/sqlite3.html#module-functions-and-constants says """The sqlite3 module internally uses a statement cache to avoid SQL parsing overhead. If you want to explicitly set the number of statements that are cached for the connection, you can set the cached_statements parameter. The currently implemented default is to cache 100 statements."" - John Machin 2010-06-17 10:25
@John Machin: Thank you; that's very useful to know. In fact, after a peruse through the module docs, it seems that there is no way to explicitly prepare a SQLite statement, so the cache is the only way statements can be prepared - Marcelo Cantos 2010-06-17 11:56


10

My 2 cents (more generic):

import csv, sqlite3
import logging

def _get_col_datatypes(fin):
    dr = csv.DictReader(fin) # comma is default delimiter
    fieldTypes = {}
    for entry in dr:
        feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]
        if not feildslLeft: break # We're done
        for field in feildslLeft:
            data = entry[field]

            # Need data to decide
            if len(data) == 0:
                continue

            if data.isdigit():
                fieldTypes[field] = "INTEGER"
            else:
                fieldTypes[field] = "TEXT"
        # TODO: Currently there's no support for DATE in sqllite

    if len(feildslLeft) > 0:
        raise Exception("Failed to find all the columns data types - Maybe some are empty?")

    return fieldTypes


def escapingGenerator(f):
    for line in f:
        yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")


def csvToDb(csvFile, outputToFile = False):
    # TODO: implement output to file

    with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
        dt = _get_col_datatypes(fin)

        fin.seek(0)

        reader = csv.DictReader(fin)

        # Keep the order of the columns name just as in the CSV
        fields = reader.fieldnames
        cols = []

        # Set field and type
        for f in fields:
            cols.append("%s %s" % (f, dt[f]))

        # Generate create table statement:
        stmt = "CREATE TABLE ads (%s)" % ",".join(cols)

        con = sqlite3.connect(":memory:")
        cur = con.cursor()
        cur.execute(stmt)

        fin.seek(0)


        reader = csv.reader(escapingGenerator(fin))

        # Generate insert statement:
        stmt = "INSERT INTO ads VALUES(%s);" % ','.join('?' * len(cols))

        cur.executemany(stmt, reader)
        con.commit()

    return con
2015-06-09 14:20
by Guy L
if len(feildslLeft) > 0: always true , so raising an exception . Please review and correct this - amu61 2016-01-09 22:34
Any way to do this without having to fseek(), so that it can be used on streams - mwag 2016-11-20 21:29
@mwag you can just skip the column type checking and import the columns all as text instead - user5359531 2018-04-17 15:49


9

Many thanks for bernie's answer! Had to tweak it a bit - here's what worked for me:

import csv, sqlite3
conn = sqlite3.connect("pcfc.sl3")
curs = conn.cursor()
curs.execute("CREATE TABLE PCFC (id INTEGER PRIMARY KEY, type INTEGER, term TEXT, definition TEXT);")
reader = csv.reader(open('PC.txt', 'r'), delimiter='|')
for row in reader:
    to_db = [unicode(row[0], "utf8"), unicode(row[1], "utf8"), unicode(row[2], "utf8")]
    curs.execute("INSERT INTO PCFC (type, term, definition) VALUES (?, ?, ?);", to_db)
conn.commit()

My text file (PC.txt) looks like this:

1 | Term 1 | Definition 1
2 | Term 2 | Definition 2
3 | Term 3 | Definition 3
2012-09-14 21:37
by jiy


6

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys, csv, sqlite3

def main():
    con = sqlite3.connect(sys.argv[1]) # database file input
    cur = con.cursor()
    cur.executescript("""
        DROP TABLE IF EXISTS t;
        CREATE TABLE t (COL1 TEXT, COL2 TEXT);
        """) # checks to see if table exists and makes a fresh table.

    with open(sys.argv[2], "rb") as f: # CSV file input
        reader = csv.reader(f, delimiter=',') # no header information with delimiter
        for row in reader:
            to_db = [unicode(row[0], "utf8"), unicode(row[1], "utf8")] # Appends data from CSV file representing and handling of text
            cur.execute("INSERT INTO neto (COL1, COL2) VALUES(?, ?);", to_db)
            con.commit()
    con.close() # closes connection to database

if __name__=='__main__':
    main()
2013-11-01 15:43
by Christopher
Where is the unicode() function - Love and peace - Joe Codeswell 2016-12-02 20:39


4

You can do this using blaze & odo efficiently

import blaze as bz
csv_path = 'data.csv'
bz.odo(csv_path, 'sqlite:///data.db::data')

Odo will store the csv file to data.db (sqlite database) under the schema data

Or you use odo directly, without blaze. Either ways is fine. Read this documentation

2017-01-11 07:16
by Kathirmani Sukumar
bz not defined : - holms 2018-04-30 14:28
and it's probably very old package because of his inner error : AttributeError: 'SubDiGraph' object has no attribute 'edge - holms 2018-04-30 14:52
Also getting same attribute error: seems there are comments on GitHub for it, thoug - user791411 2018-06-05 15:56


1

Based on Guy L solution (Love it) but can handle escaped fields.

import csv, sqlite3

def _get_col_datatypes(fin):
    dr = csv.DictReader(fin) # comma is default delimiter
    fieldTypes = {}
    for entry in dr:
        feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]        
        if not feildslLeft: break # We're done
        for field in feildslLeft:
            data = entry[field]

            # Need data to decide
            if len(data) == 0:
                continue

            if data.isdigit():
                fieldTypes[field] = "INTEGER"
            else:
                fieldTypes[field] = "TEXT"
        # TODO: Currently there's no support for DATE in sqllite

    if len(feildslLeft) > 0:
        raise Exception("Failed to find all the columns data types - Maybe some are empty?")

    return fieldTypes


def escapingGenerator(f):
    for line in f:
        yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")


def csvToDb(csvFile,dbFile,tablename, outputToFile = False):

    # TODO: implement output to file

    with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
        dt = _get_col_datatypes(fin)

        fin.seek(0)

        reader = csv.DictReader(fin)

        # Keep the order of the columns name just as in the CSV
        fields = reader.fieldnames
        cols = []

        # Set field and type
        for f in fields:
            cols.append("\"%s\" %s" % (f, dt[f]))

        # Generate create table statement:
        stmt = "create table if not exists \"" + tablename + "\" (%s)" % ",".join(cols)
        print(stmt)
        con = sqlite3.connect(dbFile)
        cur = con.cursor()
        cur.execute(stmt)

        fin.seek(0)


        reader = csv.reader(escapingGenerator(fin))

        # Generate insert statement:
        stmt = "INSERT INTO \"" + tablename + "\" VALUES(%s);" % ','.join('?' * len(cols))

        cur.executemany(stmt, reader)
        con.commit()
        con.close()
2018-06-11 13:17
by Jace


0

import csv, sqlite3

def _get_col_datatypes(fin):
    dr = csv.DictReader(fin) # comma is default delimiter
    fieldTypes = {}
    for entry in dr:
        feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]        
        if not feildslLeft: break # We're done
        for field in feildslLeft:
            data = entry[field]

        # Need data to decide
        if len(data) == 0:
            continue

        if data.isdigit():
            fieldTypes[field] = "INTEGER"
        else:
            fieldTypes[field] = "TEXT"
    # TODO: Currently there's no support for DATE in sqllite

if len(feildslLeft) > 0:
    raise Exception("Failed to find all the columns data types - Maybe some are empty?")

return fieldTypes


def escapingGenerator(f):
    for line in f:
        yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")


def csvToDb(csvFile,dbFile,tablename, outputToFile = False):

    # TODO: implement output to file

    with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
        dt = _get_col_datatypes(fin)

        fin.seek(0)

        reader = csv.DictReader(fin)

        # Keep the order of the columns name just as in the CSV
        fields = reader.fieldnames
        cols = []

        # Set field and type
        for f in fields:
            cols.append("\"%s\" %s" % (f, dt[f]))

        # Generate create table statement:
        stmt = "create table if not exists \"" + tablename + "\" (%s)" % ",".join(cols)
        print(stmt)
        con = sqlite3.connect(dbFile)
        cur = con.cursor()
        cur.execute(stmt)

        fin.seek(0)


        reader = csv.reader(escapingGenerator(fin))

        # Generate insert statement:
        stmt = "INSERT INTO \"" + tablename + "\" VALUES(%s);" % ','.join('?' * len(cols))

        cur.executemany(stmt, reader)
        con.commit()
        con.close()
2018-12-24 10:45
by Ramy Awad
Please format your code properly and add some explanatio - executable 2018-12-24 11:06
Please comment what you've done - RussellB 2018-12-24 11:17
Ads