Why is GNU wget only returning an index.html.tmp file, instead of the actual files I want?

Ask Question

Asked 15 days ago

Modified 15 days ago

Viewed 40 times

I want to download a lot of satellite data (organized by each day of each year) in .hdf format from NASA's LAADS DAAC archive. They have a helpful guide that provides a code sample to use with GNU wget and it works very well:

wget -e robots=off -m -np -R .html,.tmp -nH --cut-dirs=3 "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/PATH_TO_DATA_DIRECTORY" -- header "Authorization: Bearer MY_TOKEN" -P TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM

The issue is that this approach is very heavy-handed. I want to only download files from the data directory that include the string h19v05. After reading the GNU manual and reading other questions on this site, I edited the recommended code sample for pattern recognition:

wget -e robots=off -m -np -A "*h19v05*" -R .html,.tmp -nH --cut-dirs=3 "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/PATH_TO_DATA_DIRECTORY" -- header "Authorization: Bearer MY_TOKEN" -P TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM

However, this only returns an index.html.tmp file and then promptly removes it:

Length: 188259 (184K) [text/html]
Saving to: 'C:/Users/seyon/Desktop/thesis/tilesets/years/2007./MCD19A2/2007/index.html.tmp'
MCD19A2/2007/index.html.tmp   100%
[=================================================>] 183.85K  1.16MB/s in 0.2s
Last-modified header missing -- time-stamps turned off.
2024-07-04 05:30:49 (1.16 MB/s) - 
'C:/Users/seyon/Desktop/thesis/tilesets/years/2007./MCD19A2/2007/index.html.tmp' 
saved [188259/188259]
Removing C:/Users/seyon/Desktop/thesis/tilesets/years/2007./MCD19A2/2007/index.html.tmp since > it should be rejected.
FINISHED --2024-07-04 05:30:49--
Total wall clock time: 23s
Downloaded: 1 files, 184K in 0.2s (1.16 MB/s)

I then tried the originally recommended code sample, and it started to work again. But again, this approach recursively downloads all the files in the data directory, which means that I have to download hundreds of unnecessary files for each day of each year. Is my added syntax or format incorrect?

edited Jul 4 at 17:02

Kamil Maciorowski

75.7k22 gold badges152 silver badges229 bronze badges

asked Jul 4 at 12:48

Seyong Chang

211 bronze badge

It's probably something to do with how the archive is organized, but without access to that it's hard to do more than guess.
– Gordon Davisson
Commented Jul 6 at 17:02

Add a comment |

Stack Exchange Network

Why is GNU wget only returning an index.html.tmp file, instead of the actual files I want?

0

You must log in to answer this question.

Browse other questions tagged
bash
cmd.exe
html
bash-scripting
wget
.

Hot Network Questions

Why is GNU wget only returning an index.html.tmp file, instead of the actual files I want?

0

You must log in to answer this question.

Browse other questions tagged bashcmd.exehtmlbash-scriptingwget.

Related

Hot Network Questions

Browse other questions tagged
bash
cmd.exe
html
bash-scripting
wget
.