Sep 032014
 
Article PHP

CSV (“Comma Separated Values”) is a format commonly used to import and export tabular data from spreadsheets, databases and other applications.

This post gives some examples of how to handle this format in a PHP script

CSV format flavors

“standard” CSV

A CSV file is a text file that contains a set or records. Each record contains the values of a set of fields. The field values are written in sequence, separated by a delimiter character. The default delimiter character is a comma “,”, as implies the name of the CSV format.

The whole record is usually contained in a single line, excepting the case where there are newline characters inside the value of a field.

Besides, the first line of the CSV file is usually a header line that contains the field names, also separated by the delimiter character.

Example. The listing of employees in a company, exported from a database to a CSV file, could look like this:

Other delimiters

Although the name CSV means “comma separated values”, it is common to find CSV files that use a different delimiter, such as a tab (“\t”) or pipe (“|”) character. The format of a file that uses tabs delimiters might be referred as “TSV”, but it is also commonly referred by the generic name of “CSV”.

The string container character

There is a potential problem in that the delimiter character might appear inside the value of a field. For instance, in the employee listing example above, the field “name” might be filled with the full names of the employees as name, first name. To avoid the comma inside the value to be taken as a delimiter, the value of the field is enclosed in container characters. Usually, the double quote (“) character is used for this purpose.

Example:

As can be seen in the example, the container characters can be omitted if they are not needed.

The escape character

Again, if container characters are used to enclose the value of a field, there is the potential problem that the container character itself might appear inside the field value (i.e., one of the values of the “description” field in the example above might be “senior” programmer )

This case is handled prefixing the conflicting character with an “escape” character. Usually, a backslash (“\”) or else the double quote character itself is used as escape character.

Example:

Handling multi-line values

Sometimes, the value of a given field might span more than one line. This should not be an issue, if the value is enclosed in delimiters.

Example:

In the above example, the value of the description field spans three lines:

Reading a CSV file in PHP

PHP implements a fgetcsv() function that handles the different flavors of CSV formats described.

A “data.csv” file in “standard” CSV format can be read in a PHP script as follows:

The second argument passed to fgetcsv is the maximum number of characters in a record. A zero value can be passed to allow unlimited record size, but in that case the performance is lower.

fgetcsv can be passed additional arguments to specify the separator character, the string container character, and the escape character. For instance, the call:

can be used to read a TSV file that uses tabs (“\t”) as the separator character, single quotes (“‘”) as the string container character, and backslashes (“\”) as the escape character.

Reading a CSV file with heading line in PHP

If the first line in the file is a heading line with the field names, the sample PHP code snippet above can be improved to read the file into an associative array, where the values of the fields can be referenced by field name:

Index of posts related to PHP programming

 Posted by at 10:36 am

 Leave a Reply

(required)

(required)