Xidel / Discussion / Help: Trying to download apk updates from apkmirror.com

Virgus - 2023-07-25

Hello, I'm new to Xidel and I'm trying to learn how to use it with lots of trial and errors.

I managed up to now to script download updates from sourceforge, github and "normal" websites but now I'm facing a new challenge and I might need advise from expert users.

I took as an example teamviewer latest apk.
Package base url is: https://www.apkmirror.com/apk/teamviewer/teamviewer-host/

To get to the download link I needed to call xidel three times:
To get latest release url: https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/
To get download url: https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/teamviewer-host-15-43-203-android-apk-download/
To get download link: https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/teamviewer-host-15-43-203-android-apk-download/download/?key=1d371836842c115a39fe79d1413e29a5ea3619bb

I managed to get the download link but then I'm trying to use wget (as I'm doing usually) and the answer is "ERROR 403: Forbidden". I know that there's a way to download a file from within xidel but I've had always an hard time to use it.

Could anybody please point me to the correct way of doing all this ? Besides the download issue are there shorter/cleaner ways to get to the result ? I'd like also to know if there are tutorials for unexperienced users like I am, I couldn't find any, except for some examples here and there...

Thanks and have a nice day,
V.

Here the script I'm currently using:

@ECHO OFF SETLOCAL EnableDelayedExpansion SET "DLDIR=%~dp0" SET "XIDEL=C:\CMDs\Scripts\Networking\Xidel\xidel.exe" SET "SRCURL=https://www.apkmirror.com/apk/teamviewer/teamviewer-host/" SET "DLHOST=https://www.apkmirror.com" FOR /F "delims=" %%A IN ('%XIDEL% -s %SRCURL% -e "//a[@class='downloadLink']/@href"') DO SET "DLHREF=%%A" && GOTO :NEXT :NEXT SET DLURL1=%DLHOST%%DLHREF% ECHO "%DLURL1%" & PAUSE FOR /F "delims=" %%A IN ('%XIDEL% -s %DLURL1% -e "//a[@class='accent_color']/@href"') DO SET "DLHREF2=%%A" SET DLURL2=%DLHOST%%DLHREF2% ECHO "%DLURL2%" & PAUSE FOR /F "delims=" %%A IN ('%XIDEL% -s %DLURL2% -e "//a/@href" ^| findstr "?key="') DO SET "DLHREF3=%%A" SET DLURL3=%DLHOST%%DLHREF3% ECHO "%DLURL3%" & PAUSE WGET -nc -L "%DLURL3%" PAUSE

Last edit: Virgus 2023-07-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

I finally found a way: wget was just missing the --user-agent parameter and further steps were necessary to get both the final url and a meaningful file name.

I didn't make it in avoiding the first wget that is used to get the file download "proxy" page. I wish I could do it via Xidel avoiding a creation of a temporary file. But I failed with xidel getting the content of "https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/teamviewer-host-15-43-203-android-apk-download/download/?key=1d371836842c115a39fe79d1413e29a5ea3619bb"
What would be the correct syntax for xidel to parse this page ?

Here the script corrected and completed, please advise if it could be simplified in any way.
Thanks.

@ECHO OFF
SETLOCAL EnableDelayedExpansion
PUSHD "%~dp0"

SET "DLDIR=%~dp0"
SET "XIDEL=C:\CMDs\Scripts\Networking\Xidel\xidel.exe"

SET "DLHOST=https://www.apkmirror.com"

:: PACKAGE BASE URL
SET "SRCURL=https://www.apkmirror.com/apk/teamviewer/teamviewer-host/"

:: GET APK RELEASE URL
FOR /F "delims=" %%A IN ('%XIDEL% -s %SRCURL% -e "//a[@class='downloadLink']/@href"') DO SET "DLHREF=%%A" && GOTO :NEXT
:NEXT
SET "DLURL1=%DLHOST%%DLHREF%"
ECHO "%DLURL1%" & TIMEOUT 1 > nul

:: GET APK NAME
FOR /F "delims=" %%A IN ('%XIDEL% -s %DLURL1% -e "//div[@class='f-grow']/h1"') DO SET "DLNAME=%%A"
SET "DLNAME=%DLNAME: =_%_ApkMirror.apk"
ECHO "%DLNAME%" & TIMEOUT 1 > nul

:: GET APK DL URL
FOR /F "delims=" %%A IN ('%XIDEL% -s %DLURL1% -e "//a[@class='accent_color']/@href"') DO SET "DLHREF2=%%A"
SET "DLURL2=%DLHOST%%DLHREF2%"
ECHO "%DLURL2%" & TIMEOUT 1 > nul

:: GET APK DL PAGE URL
FOR /F "delims=" %%A IN ('%XIDEL% -s %DLURL2% -e "//a/@href" ^| findstr "?key="') DO SET "DLHREF3=%%A"
SET "DLURL3=%DLHOST%%DLHREF3%"
ECHO "%DLURL3%" & TIMEOUT 1 > nul

:: GET APK DL PAGE AND PARSE IT FOR APK DL URL
WGET -nc -L "%DLURL3%" --user-agent="Mozilla" --content-disposition -O clickhere.html 2>nul
REM TYPE clickhere.html | findstr /r /i /c:"?id=.*key=.*here"
FOR /F "delims=" %%A IN ('%XIDEL% -s clickhere.html -e "//a/@href" ^| findstr "APKMirror/download.php?"') DO SET "DLHREF4=%%A"
SET "DLURL4=%DLHOST%%DLHREF4%"
ECHO "%DLURL4%" & TIMEOUT 1 > nul

:: GET APK FILE
WGET -nc -L "%DLURL4%" --user-agent="Mozilla" --content-disposition -O "%DLNAME%"
IF EXIST "%DLNAME%" DEL /F "clickhere.html"

PAUSE
EXIT

Anonymous

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Hello Virgus,

Have you actually removed (or commented) @ECHO OFF to debug your script? The first extraction...

%XIDEL% -s %SRCURL% -e "//a[@class='downloadLink']/@href"

...doesn't only extract the first "apk-release-url". It extracts...

xidel -s "https://www.apkmirror.com/apk/teamviewer/teamviewer-host/"^
      -e "count(//a[@class='downloadLink']/@href)"
50

...of them. And because of the FOR-loop only the last one will be assigned to %DLHREF%.
If you only want the first one, use -e "(//a[@class='downloadLink'])[1]/@href".
The same goes for the "apk-dl-url".

Could anybody please point me to the correct way of doing all this ? Besides the download issue are there shorter/cleaner ways to get to the result ?

Yes, there is. In fact, all this can be done with just 1 xidel call:

xidel "https://www.apkmirror.com/apk/teamviewer/teamviewer-host/"^
      -e "dlname:=replace((//h5)[1]/a,'\s','_')||'_ApkMirror.apk'"^
      -f "(//h5)[1]/a/@href"^
      -f "//div[@class='table-row headerFont']/div[1]/a/@href"^
      -f "//a[contains(@class,'downloadButton')]/@href"^
      -f "//span/a[@data-google-vignette]/@href"^
      --download "{$dlname}"
Retrieving (GET): https://www.apkmirror.com/apk/teamviewer/teamviewer-host/
Processing: https://www.apkmirror.com/apk/teamviewer/teamviewer-host/
Assigned variable log:
dlname := TeamViewer_Host_15.43.203_ApkMirror.apk
Retrieving (): https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/
Processing: https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/
Retrieving (): https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/teamviewer-host-15-43-203-android-apk-download/
Processing: https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/teamviewer-host-15-43-203-android-apk-download/
Retrieving (): https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/teamviewer-host-15-43-203-android-apk-download/download/?key=21f920436b5dd9e025a736073397fe21eee819dd
Processing: https://www.apkmirror.com/apk/teamviewer/teamviewer-host/teamviewer-host-15-43-203-release/teamviewer-host-15-43-203-android-apk-download/download/?key=21f920436b5dd9e025a736073397fe21eee819dd
Retrieving (): https://www.apkmirror.com/wp-content/themes/APKMirror/download.php?id=5074549&key=40f76c5c75d2bb2a9be2c6c92da4f9a6a958a191
Save as: TeamViewer_Host_15.43.203_ApkMirror.apk

There's no need for multiple xidel calls, because with -f/--follow you can open, download, or "follow", other urls.

%DLHOST% isn't necessary either, because "following" relative urls (with -f/--follow)...

xidel "https://www.apkmirror.com/apk/teamviewer/teamviewer-host/"^
      -e "$host,$path"
www.apkmirror.com
/apk/teamviewer/teamviewer-host/

...xidel automatically puts $host in front.

With xidel <url> -f ... -e ... you're opening/downloading/"following" an url in memory and extracting something to stdout. With xidel <url> -f ... --download . you're actually downloading/writing it to disk (to the current dir in this case). So there's no need for wget.

And without -s, as you can see above, you can see some interesting information about what's happening (status information).

A final advice. Prettify an HTML-source first before examining it to come up with a suitable XPath-query:

xidel "https://www.apkmirror.com/apk/teamviewer/teamviewer-host/"^
      -e . --output-node-format=xml --output-node-indent

xidel "https://www.apkmirror.com/apk/teamviewer/teamviewer-host/"^
      -f "(//h5)[1]/a/@href"^
      -e . --output-node-format=xml --output-node-indent

xidel "https://www.apkmirror.com/apk/teamviewer/teamviewer-host/"^
      -f "(//h5)[1]/a/@href"^
      -f "//div[@class='table-row headerFont']/div[1]/a/@href"^
      -e . --output-node-format=xml --output-node-indent

etc...

Anonymous

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Reino - 2023-07-27

I'd like also to know if there are tutorials for unexperienced users like I am, I couldn't find any, except for some examples here and there...

For Xidel stuff, see https://github.com/benibela/xidel/issues/67#issuecomment-770084663.
For XPath/XQuery stuff, see https://github.com/benibela/xidel/issues/106#issuecomment-1627386429.

An extensive wiki is still on my to-do-list.

Last edit: Reino 2023-07-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Virgus - 2023-07-29
  
  Thank you so much for your detailed replies !
  I was just passing by and I'm looking forward to read all your info attentively asap.
  Thanks for having taken the time to reply to me and talk to you soon, V.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Trying to download apk updates from apkmirror.com

Xidel is a cli webpage scraping tool supporting XPath/XQuery 3 and CSS

Forums

Help

Trying to download apk updates from apkmirror.com

Trying to download apk updates from apkmirror.com

Xidel is a cli webpage scraping tool supporting XPath/XQuery 3 and CSS

Forums

Help

Trying to download apk updates from apkmirror.com document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Trying to download apk updates from apkmirror.com