WebDesires - Web Development Cannock, West Midlands

Cannock UK: T: 01543 547 899 Birmingham UK: T: 0121 318 6336 International: T: +44 121 318 6336

XLSX To CSV Conversion PHP (Fast & Efficient)

XLSX To CSV Conversion PHP (Fast & Efficient)

Need Some Help? We are here for you!

We have a very friendly service - Come and chat to us and let us know what you need, we work for an hourly fee and can also provide you a no obligation quote and begin work immediately in most cases. Click "Request Support" or use our Live Chat.

Request Support
There is a mobile optimized version of this page, view AMP Version.

When doing anything these days the very first thing I look for is a resource or library that can do what I need, and for the most part these are better maintained, more thought through and generally a safer bet than hacking / slashing things together yourself. Basically who wants to re-invent the wheel.

However recently I needed to convert an XLSX file to CSV in PHP, I thought this would be super easy since there are many XLSX readers and converter libraries around for many formats including XLSX. I tried first PHPEXCEL but it was crashing during the conversion, so I tried the newer library PHPSpreadsheet thinking this will surely do the trick.

The problem was, the XLSX I was converting was absolutely HUGE, 65,000 rows of data to be exact. And both libraries were using up huge amounts of ram and then failing. I even tried the reader options to read the file in chunks but the library just could not handle all the data and was falling over dramatically.

Eventually I realised that even if I could get it to somehow work without failing, it was taking an awfully long time to process the file, and perhaps I needed a more direct, simple and ultimately faster approach.

I knew that an XLSX was just a bunch of XML files inside a ZIP and so I unpacked a file, and started figuring out how it works, from that I realised there is a main sheet in XML and a variables sheet also in XML, the variables seem to be used for repetitive data, I guess to make the files smaller, which is pretty smart. So anyway I wrote a script that basically just reads these files and builds a CSV from all the data. It works fantastically and extremely fast for my insane file.

Here is the code below:

<?php 
/**
	* xlsx2csv.php converts .xlsx files to .csv format
	*  
	* You may freely use, modify or redistribute this script provided this header remains intact
	*    
	* @title      xlsx2csv.php 
	* @author     Dean Williams <https://webdesires.co.uk>
	* @version    1.0
	* @link       https://webdesires.co.uk
*/

/*
	Function info:
	xlsx_file = Full path to XLSX file
	xlsx_unzip_path = The XLSX file will be unzipped to this path (will overwrite existing files) make sure it ends in slash /
	csv_file = the resulting csv file
	(optional) throttle = a number of rows max to export
	(optional) cleanup = remove all files in xlsx_unzip_path once complete set to 1
*/

function xlsx2csv($xlsx_file, $xlsx_unzip_path, $csv_file, $throttle = '', $cleanup = 0) {
	@mkdir($xlsx_unzip_path);

	xlsxUnZIP($xlsx_file, $xlsx_unzip_path);
	
	$strings = array();  
	$dir = getcwd();
	$filename = $xlsx_unzip_path."xl/sharedStrings.xml";   
	/**
		* XMLReader node-by-node processing improves speed and memory in handling large XLSX files
		* Hybrid XMLReader/SimpleXml approach 
		* per http://stackoverflow.com/questions/1835177/how-to-use-xmlreader-in-php
		* Contributed by http://stackoverflow.com/users/74311/josh-davis
		* SimpleXML provides easier access to XML DOM as read node-by-node with XMLReader
		* XMLReader vs SimpleXml processing of nodes not benchmarked in this context, but...
		* published benchmarking at http://posterous.richardcunningham.co.uk/using-a-hybrid-of-xmlreader-and-simplexml
		* suggests SimpleXML is more than 2X faster in record sets ~<500K
	*/
	$z = new XMLReader;
	$z->open($filename);
	$doc = new DOMDocument;
	$csvfile = fopen($csv_file,"w");
	while ($z->read() && $z->name !== 'si');

	while ($z->name === 'si')
	{ 
		// either one should work
		$node = new SimpleXMLElement($z->readOuterXML());
		// $node = simplexml_import_dom($doc->importNode($z->expand(), true));
		
		$result = xmlObjToArr($node);   
		//$count = count($result['text']);
		
		//if(isset($result['children']['t'][0]['text'])){
			
			$strings[] = $result['children']['t'][0]['text'];
			
		//};    
		$z->next('si');
		$result=NULL;      
	}
	
	$z->close($filename);
	$dir = getcwd();
	$filename = $xlsx_unzip_path."xl/worksheets/sheet1.xml";    
	$z = new XMLReader;
	$z->open($filename);
	$doc = new DOMDocument;
	$rowCount="0";
	$doc = new DOMDocument; 
	$sheet = array();  
	$nums = array("0","1","2","3","4","5","6","7","8","9");
	while ($z->read() && $z->name !== 'row');

	$counter = 0;
	while ($z->name === 'row')
	{  
		$counter++;
		$thisrow=array();
		$node = new SimpleXMLElement($z->readOuterXML());
		$result = xmlObjToArr($node); 
		$cells = $result['children']['c'];
		$rowNo = $result['attributes']['r']; 
		$colAlpha = "A";
		foreach($cells as $cell){
			$cellcode = $cell['attributes']['r'];
			if(array_key_exists('v',$cell['children'])){
				$cellno = str_replace($nums,"",$cell['attributes']['r']);
				for($col = $colAlpha; $col != $cellno; $col++) {
					$thisrow[$cellcode]=" ";
					$colAlpha++; 
				};
				if(array_key_exists('t',$cell['attributes'])&&$cell['attributes']['t']='s'){
					$val = $cell['children']['v'][0]['text'];
					$string = $strings[$val] ;
					$thisrow[$cellcode]=$string;
				} 
				else {
					$thisrow[$cellcode]=$cell['children']['v'][0]['text'];
				}
			}
			else {$thisrow[$cellcode]="";};
			$colAlpha++;
		};
		$rowLength=count($thisrow);
		$rowCount++;
		$emptyRow=array();
		while($rowCount<$rowNo){
			for($c=0;$c<$rowLength;$c++) {
				$emptyRow[]=""; 
			}
			if(!empty($emptyRow)){
				my_fputcsv($csvfile,$emptyRow);
			};
			$rowCount++;
		};
		my_fputcsv($csvfile,$thisrow);      
		
		if($rowCount<$throttle||$throttle==""||$throttle=="0") {
			$z->next('row');
		} else {
			break;
		};
		$result=NULL; 
	};
	$z->close($filename);
	
	//cleanup
	if ($cleanup == 1) {
		foreach (glob(xlsx_unzip_path."*.*") as $filename) {
			if (is_file($filename)) {
				unlink($filename);
			}
		}
		recursiveRemoveDirectory(xlsx_unzip_path);
	}
}

function recursiveRemoveDirectory($directory) {
	foreach(glob("{$directory}/*") as $file)
	{
		if(is_dir($file)) { 
			recursiveRemoveDirectory($file);
		} else {
			unlink($file);
		}
	}
	rmdir($directory);
}


function xmlObjToArr($obj) {
	/**
		* convert xml objects to array
		* function from http://php.net/manual/pt_BR/book.simplexml.php
		* as posted by xaviered at gmail dot com 17-May-2012 07:00
		* NOTE: return array() ('name'=>$name) commented out; not needed to parse xlsx
	*/
	$namespace = $obj->getDocNamespaces(true);
	$namespace[NULL] = NULL;
	
	$children = array();
	$attributes = array();
	$name = strtolower((string)$obj->getName());
	
	$text = trim((string)$obj);
	if( strlen($text) <= 0 ) {
		$text = NULL;
	}
	
	// get info for all namespaces
	if(is_object($obj)) {
		foreach( $namespace as $ns=>$nsUrl ) {
			// atributes
			$objAttributes = $obj->attributes($ns, true);
			foreach( $objAttributes as $attributeName => $attributeValue ) {
				$attribName = strtolower(trim((string)$attributeName));
				$attribVal = trim((string)$attributeValue);
				if (!empty($ns)) {
					$attribName = $ns . ':' . $attribName;
				}
				$attributes[$attribName] = $attribVal;
			}
			
			// children
			$objChildren = $obj->children($ns, true);
			foreach( $objChildren as $childName=>$child ) {
				$childName = strtolower((string)$childName);
				if( !empty($ns) ) {
					$childName = $ns.':'.$childName;
				}
				$children[$childName][] = xmlObjToArr($child);
			}
		}
	}
	
	return array(
	// name not needed for xlsx
	// 'name'=>$name,
	'text'=>$text,
	'attributes'=>$attributes,
	'children'=>$children
	);
} 

function xlsxUnZIP($zipfile, $folder) {
	/* a quick wrapper function to handle unzip of a archive */
	$zip = new ZipArchive;
	$res = $zip->open($zipfile);
	
	if ($res === TRUE) {
		@mkdir($folder);
		$zip->extractTo($folder);
		$zip->close();
	} else {
		die('failed to unzip: '.$zipfile);
	}
}

function my_fputcsv($handle, $fields, $delimiter = ',', $enclosure = '"', $escape = '\\') {
	/**
		* write array to csv file
		* enhanced fputcsv found at http://php.net/manual/en/function.fputcsv.php
		* posted by Hiroto Kagotani 28-Apr-2012 03:13
		* used in lieu of native PHP fputcsv() resolves PHP backslash doublequote bug
		* !!!!!! To resolve issues with escaped characters breaking converted CSV, try this:
		* Kagotani: "It is compatible to fputcsv() except for the additional 5th argument $escape, 
		* which has the same meaning as that of fgetcsv().  
		* If you set it to '"' (double quote), every double quote is escaped by itself."
	*/
	$first = 1;
	foreach ($fields as $field) {
		if ($first == 0) fwrite($handle, ",");
		$f = str_replace($enclosure, $enclosure.$enclosure, $field);
		if ($enclosure != $escape) {
			$f = str_replace($escape.$enclosure, $escape, $f);
		}
		if (strpbrk($f, " \t\n\r".$delimiter.$enclosure.$escape) || strchr($f, "\000")) {
			fwrite($handle, $enclosure.$f.$enclosure);
			} else {
			fwrite($handle, $f);
		}
		$first = 0;
	}
	fwrite($handle, "\n");
}
?>

Need Some Help? We are here for you!

We have a very friendly service - Come and chat to us and let us know what you need, we work for an hourly fee and can also provide you a no obligation quote and begin work immediately in most cases. Click "Request Support" or use our Live Chat.

Request Support

Author: Dean Williams

Professional PHP Web Developer with expertise in OpenCart Web Development, WordPress Web Development, Bespoke Systems - also a seasoned Linux Server Administrator.