I recently had a project to create a comprehensive CSV file of all faculty, with some information. Same thing for staff. I opened the user AD group with powershell and collected all the ‘faculty’ OU entries. With a Directory Entry, i got the appropriate attributes, cleaned them up if necessary, and output the result to an array, and at the end to a comma delimited text file with a csv extention. Same thing for the staff.
However AD does really make it simple to get the file sorted, and the first entry was the userid. I wanted to provide the file sorted on last name (the second field).
I probably could have put the data into a datagrid, and either sorted that or tried to write the file out in a sorted fashion.
Instead, the FIRST line of the array to which i wrote the users data was a header line with the name of the column of data. That effectively said that the text file i was saving was REALLY a csv file.
One note is that the appropriate way to create the header is
$header = "field1, field2,field3"
and NOT $header = “field1″,”field2″,”field3” – this creates $header as an array of single values (three of them in this case) and what we want is a single array entry with three values – to match the user data.
Now, having saved the “csv” file, i import it with import-csv, and sort on “field2” which in the file is really “lastname”. The result is immediately exported back to disk with -notypeinformation. Incidentally, this file opens nicely in excel.
$facArray> c:\scripts\atfolder\faculty.csv $sortthis = import-CSV c:\scripts\atfolder\faculty.csv | Sort-Object lastname $sortthis |export-csv c:\scripts\atfolder\faculty.csv -force -NoTypeInformation
Oh – i forgot to mention that the “department” field has the annoying propensity to have commas in it – this field i had to manually wrap with ,`” data `”, so that the Excel view did not break. The ‘Ticks” allowed the quote output to be excaped. Before the import-export, it can in with comma separated fields that “text to columns” would deal with, but now Excel will put the data into each column.
But that is not the purpose for this post.
In the middle of the semester, the data will not change much if at all. Since my contact says to deliver the file, and since i do know that this will be imported into a local copy of (gasp, horror, gasp) Access, there is a need for me to do work but it would be nice not to have the client update the faculty or staff tables IF there is no change.
A way to determine that is to create a MD5 hash of the faculty and staff files and compare them to the previous version. I could output each line separately, with append on each line except for the header – that would overwrite the file, or output the entire array with -force to overwrite the file, but i decided to explicitly remove the file (if it existed) but FIRST i capture the signature of the old file – created the new file, and then could compare the old versus the new. If either file changed, a suitable message is constructed, alerting the recipient of the email that the attached files had changed so processing was needed, or that they had not changed – time to go home.
The code to create the MD5 file uses the DOTNET libraries and looks like this:
write-verbose "get a checksum for csv files and then remove them." $md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider $hashFacultyOld = [System.BitConverter]::ToString($md5.ComputeHash([System.IO.File]::ReadAllBytes("C:\scripts\atfolder\faculty.csv"))) $hashStaffOld = [System.BitConverter]::ToString($md5.ComputeHash([System.IO.File]::ReadAllBytes("C:\scripts\atfolder\staff.csv"))) Remove-Item c:\scripts\atfolder\*.csv
After the new file is created, the hash value is stored as $hashStaffNew and old and new compared. I got the code from StackOverFlow