I want to download a lot of satellite data (organized by each day of each year) in .hdf
format from NASA's LAADS DAAC archive. They have a helpful guide that provides a code sample to use with GNU wget
and it works very well:
wget -e robots=off -m -np -R .html,.tmp -nH --cut-dirs=3 "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/PATH_TO_DATA_DIRECTORY" -- header "Authorization: Bearer MY_TOKEN" -P TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM
The issue is that this approach is very heavy-handed. I want to only download files from the data directory that include the string h19v05
. After reading the GNU manual and reading other questions on this site, I edited the recommended code sample for pattern recognition:
wget -e robots=off -m -np -A "*h19v05*" -R .html,.tmp -nH --cut-dirs=3 "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/PATH_TO_DATA_DIRECTORY" -- header "Authorization: Bearer MY_TOKEN" -P TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM
However, this only returns an index.html.tmp
file and then promptly removes it:
Length: 188259 (184K) [text/html]
Saving to: 'C:/Users/seyon/Desktop/thesis/tilesets/years/2007./MCD19A2/2007/index.html.tmp'
MCD19A2/2007/index.html.tmp 100%
[=================================================>] 183.85K 1.16MB/s in 0.2s
Last-modified header missing -- time-stamps turned off.
2024-07-04 05:30:49 (1.16 MB/s) -
'C:/Users/seyon/Desktop/thesis/tilesets/years/2007./MCD19A2/2007/index.html.tmp'
saved [188259/188259]
Removing C:/Users/seyon/Desktop/thesis/tilesets/years/2007./MCD19A2/2007/index.html.tmp since > it should be rejected.
FINISHED --2024-07-04 05:30:49--
Total wall clock time: 23s
Downloaded: 1 files, 184K in 0.2s (1.16 MB/s)
I then tried the originally recommended code sample, and it started to work again. But again, this approach recursively downloads all the files in the data directory, which means that I have to download hundreds of unnecessary files for each day of each year. Is my added syntax or format incorrect?