Working with Files In Linux
-
I am working on document cleanup in an ancient custom (shitty) application we are trying to retire. Basically, there are files everywhere, and I need to find the files that are referenced in the database in the filesystem. My plan is to dump the file references from the application's database into a table, and do the same for the filesystem in another table. I will then match by filename and go from there.
However, I'm not sure how to approach capturing the files at the filesystem level. Say said files are structured in /this/directory, what would be the best way to capture the following data?
Filename | Absolute Path | Modified Date
Any advice would be appreciated. For what it's worth, this is on CentOS 7.
Thanks!!
-
No need to get the filename, the absolute path will include that already.
-
I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?
-
@scottalanmiller said:
No need to get the filename, the absolute path will include that already.
I want the file name and path to said file separate, but I suppose I could separate them through another step. I'm going to be matching by file name. basically table1.filename = table2.filename
-
@scottalanmiller said:
I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?
Every single file under /this/directory.
-
@anthonyh said:
@scottalanmiller said:
No need to get the filename, the absolute path will include that already.
I want the file name and path to said file separate, but I suppose I could separate them through another step. I'm going to be matching by file name. basically table1.filename = table2.filename
Just use a filter on the existing file, no need to make a separate file for that.
-
@anthonyh said:
@scottalanmiller said:
I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?
Every single file under /this/directory.
Oh okay.
find /dir -type f -print
Where /dir is the directory name in question. See if that gives you want you want.
-
This is super easy to do in Linux.... If you know all the commands like @scottalanmiller!
-
@scottalanmiller said:
@anthonyh said:
@scottalanmiller said:
I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?
Every single file under /this/directory.
Oh okay.
find /dir -type f -print
Where /dir is the directory name in question. See if that gives you want you want.
That gives me the absolute path, but no date. I found this command that gets me a little closer:
find /this/directory -type f -exec stat -c "%n %y" {} ;
Gives me this:
/this/directory/data/EFile/DOC/227349_FS86478.pdf 2011-08-19 10:21:22.000000000 -0700
But it's not ideal, yet. I'd need to delimit the file and timestamp with something other than a space. I would love to eliminate the decimal on the seconds as well as the timezone, but I can work around those.
-
Ooh, I'm very close!
find /this/directory -type f -printf "%f\t" -printf "%h\t" -printf "%Tc\n"
Gets me this:
254405_FS85691.pdf /this/directory/data/EFile/CASEDOC Mon 27 Aug 2012 08:52:15 AM PDT
If I can get the timestamp formatted as YYY-MM-DDD HH:MM:SS (24h time) I will be golden! I don't care about PDT vs PST.
-
I think I've got it close enough!
find /this/directory -type f -printf "%f\t" -printf "%h\t" -printf "%TY-%Tm-%Td %TH:%TM\n"
Result:
101581_PR78450.pdf /this/directory/data/EFile/MO 2007-10-30 11:16
-
@anthonyh said:
@scottalanmiller said:
@anthonyh said:
@scottalanmiller said:
I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?
Every single file under /this/directory.
Oh okay.
find /dir -type f -print
Where /dir is the directory name in question. See if that gives you want you want.
That gives me the absolute path, but no date. I found this command that gets me a little closer:
find /this/directory -type f -exec stat -c "%n %y" {} ;
Gives me this:
/this/directory/data/EFile/DOC/227349_FS86478.pdf 2011-08-19 10:21:22.000000000 -0700
But it's not ideal, yet. I'd need to delimit the file and timestamp with something other than a space. I would love to eliminate the decimal on the seconds as well as the timezone, but I can work around those.
Easier to work with the date if you use UNIX time instead of a human readable format. And you can use the cut command to trim off anything trailing that you don't want.