- Home /
Download HTML source code from link not working
Hi, I've got a website that uses this php file manager, it's called filegator.
Now I've been trying to use the WWW class and pass it the link (http://www.file-gator.com/gator/m/?cd=Documents) and display the html code for that site, but it keeps showing up as an empty text. (Except for the response header, i get that)
In chrome I can inspect the html code but I'm unsure of how to download it. I think it may have to do something with the parameter "cd="
Doing this on another website usually works. Any help would be greatly appreciated.
Code:
WWW downloadDailynoticeWWW = new WWW("http://www.file-gator.com/gator/m/?cd=Documents");
while (!downloadDailynoticeWWW.isDone)
{
contentText.text = "Downloading Daily Notices";
yield return null;
}
Debug.Log(downloadDailynoticeWWW.text);
Answer by Bunny83 · Aug 02, 2017 at 04:51 AM
This website uses session cookies. If you request a page without an active session you seem to get redirected to the main page. This might just be the result of the complex session management of that site or on purpose to fight bots.
You need a clientside cookie management and do at least two requests. The WWW class is very limited in this regard. It can only perform single, unrelated requests. The main problem is that the WWW class uses a Dictionary / hashtable for the response headers. This is a big problem because most of those complex sites issue multiple Set-Cookie headers in a single response. However a dictionary can only contain one.
I use a local proxy server as additional content filter. It has a log window. This is how it looks like when you visit the link you've posted for the first time (when you don't have any cookies yet).
Probably the only important cookie is the "PHPSESSID" cookie. If that one gets through when using WWW you can set that cookie in the request header when doing the second request.
edit
Here's a working example:
string cookie = null;
IEnumerator GetCookie()
{
UnityWebRequest tmp = new UnityWebRequest("http://www.file-gator.com/gator/m/?cd=");
tmp.redirectLimit = 0; // important
yield return tmp.Send();
string s = tmp.GetResponseHeader("Set-Cookie");
int i = s.IndexOf("PHPSESSID=");
s = s.Substring(i);
i = s.IndexOf(";");
s = s.Substring(0, i);
cookie = s;
}
IEnumerator Load()
{
if (cookie == null)
{
yield return StartCoroutine(GetCookie());
}
UnityWebRequest req = new UnityWebRequest("http://www.file-gator.com/gator/m/?cd=Documents");
req.SetRequestHeader("Cookie", cookie);
req.downloadHandler = new DownloadHandlerBuffer();
yield return req.Send();
Debug.Log("res:" +req.downloadHandler.text);
}
Hmm, ok so would this work? I make the first www call to that website, and extract the cookie from that (Is that possible?)
Than using the "PHPSESSID" cookie I extracted, I make a second request using www again. Like this and where would the cookie go?
public class ExampleClass : $$anonymous$$onoBehaviour
{
IEnumerator Start()
{
WWWForm form = new WWWForm();
form.AddField( "cd", "Documents" );
Hashtable headers = form.headers;
byte[] rawData = form.data;
string url = "http://www.file-gator.com/gator/m/?cd=Documents";
// Add a custom header to the request.
// Post a request to an URL with our custom headers
WWW www = new WWW(url, rawData, headers);
yield return www;
//.. process results from WWW request here...
}
}
I copied this from here
EDIT:
Ok I think ins$$anonymous$$d of using www class, I should use UnityWebRequest. Is this a better option?
Yes it's better to use UnityWebRequest. I'll update my answer with a working example.
It seems WWW as well as "UnityWebRequest" both automatically "follow" a redirect response. However since the cookie is not set yet the server will permantently return a redirect and WWW and UnityWebRequest get stuck in an infinite request loop.
However UnityWebRequest allows you to disable automatic redirection by setting the "redirectLimit" to 0. This allows you to actually get the 302 response and extract the session cookie,
Your answer
Follow this Question
Related Questions
Debug log to website. 2 Answers
WebHosting with phpMyAdmin PHP 0 Answers
Multiple Cars not working 1 Answer
Distribute terrain in zones 3 Answers
Sending variables to HTML 1 Answer