I did some work to aggregate some logs from a group of servers for the whole month of February. This took a while, but I ended up with a nice CSV file that I was ready to load into Excel to create some Pivot Tables. See more at:
However, when I tried to load it into Excel, I got one of the messages I hate the most: “File not loaded completely”. That means that the file I was loading had more than one million rows, which means it cannot be loaded into a single spreadsheet. Bummer…
Now I had to split the log file into two files, but I wanted to do it in a way that made sense. The first column in the CSV file was actually the date (although the data was not sorted by date). So it occurred to me that it was simple enough to write a PowerShell script to do the job (instead of trying to process the data again in two batches).
In the end, since it was all February data and the date was in the mm/dd/yyyy format, I could just split the line by “/” and get the second item. I also needed to convert that item into an integer, since if I used a string comparision it would not work (using the string type, “22” is less than “3”). I also had to put an encoding option to the out-file cmdlet to preserve the log’ s original format, avoid doubling size of the resulting file and keep Excel happy.
Here are the two lines I used to split the log into two files (one with data up to 02/14/15 and the other with the rest of the month):
Type .\server.csv | ? { [int] $_.Split("/")[1]) -lt 15 } | Out-File .\server1.csv -Encoding utf8Type .\server.csv | ? { [int] $_.Split("/")[1]) -ge 15 } | Out-File .\server2.csv -Encoding utf8
That worked well, but I lost the first line of the log with the column headers. It would be simple enough at this point to open the files with Notepad (which is surprisingly capable of handling very large log files), but at this point I was trying to find a way using just PowerShell. The solution was to introduce a line counter to add to the equation:
$l=0; type .\server.csv | ? { ($l++ -eq 0) -or ( ([int] $_.Split("/")[1]) -lt 15 ) } | Out-File .\server1.csv -Encoding utf8$l=0; type .\server.csv | ? { ($l++ -eq 0) -or ( ([int] $_.Split("/")[1]) -ge 15 ) } | Out-File .\server2.csv -Encoding utf8
PowerShell was actually quick to process the large file and the results file worked fine with Excel. In case you’re wondering, you could easily adapt the filter to use full dates. You would split by the comma separator (instead of “/”) and you would use the datetime type instead of int. The filter would look like this:
$l=0; type .\server.csv | ? { ($l++ -eq 0) -or ([datetime] $_.Split(",")[0] -gt [datetime] "02/15/2016") } | Out-File .\server1.csv -Encoding utf8
Now let me get back to my Pivot Tables…