Retrieving data from a web page
78 views (last 30 days)
Show older comments
Hi all, I am having troubles in retrieving some data from this website http://ricercaweb.unibocconi.it/criospatstatdb/rep06a.php. I am not very experienced in programming and it is the first time I try getting data from the web this way. I read quite a few posts but I don't get what an API is and the issues related to the webpage formats.
To put it simply, I'd like to tell MATLAB to go on the stated webpage, to put the word 'CompanyName'(ex.Novartis) into the search bar, retrieve the resulting table and save it in a usable format (csv for example).
I run this code but it doesn't work.
url = 'http://ricercaweb.unibocconi.it/criospatstatdb/rep06a.php';
options = weboptions('Keyname','Novartis','Keyvalue','text','ContentType','auto');
data = webread(url,options);
Could anyone help me, please? Thanks.
0 Comments
Accepted Answer
Paolo
on 13 May 2018
Hi Enrico, try the following:
url = 'http://ricercaweb.unibocconi.it/criospatstatdb/csv/rep06a_';
prompt = 'Enter company of interest:';
val = input(prompt,'s');
url = strcat(url,val,'.csv');
options = weboptions('RequestMethod','get','ArrayFormat','csv','ContentType','text');
try
data = webread(url,options);
disp('CSV formatted data:');
data
catch
disp('No information found.');
end
If by 'inserting the name of the company in the search bar' you meant just changing the URL, then the code above should do the trick. Simply specify the term of interest and it will retrieve the data for you.
On the other hand, if you were asking for a code which fills in the HTML form with a POST request, and retrieves the resulting data afterwards with a GET request, some changes would need to be made. Let me know.
3 Comments
Paolo
on 16 May 2018
Edited: Paolo
on 16 May 2018
Hi Enrico,
I am not entirely sure whether the web page you are using supports POST request. You can achieve the exact same functionalities without it anyway. When you submit the form, a GET request for the term of interest (e.g. Novartis) is sent to a certain address. We can use this address for implementing what you require. You can see this address yourself if you inspect the form (right click, inspect), and then check the network components of the window (if you are using Chrome).
You can copy the GET request from here, with the easiest option being the 'curl' command. The curl command will look something like this:
curl "http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=novartis"
You can easily implement this in Matlab. There are two ways I can think of for doing it.
The first is as follows:
%Perform request at GET URL.
first_url = 'http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=';
prompt = 'Enter company of interest:';
val = input(prompt,'s');
first_url = strcat(first_url,val);
options = weboptions('RequestMethod','auto','ContentType','text');
%Read response.
try
data = webread(first_url,options);
catch
disp('No information found.');
end
%Search for http link. Perform second GET request at .csv address.
if ~isempty(data)
expression = '(http://).*(.csv)';
[~,matches] = regexp(data,expression,'tokens','match');
second_url = matches{1};
options = weboptions('RequestMethod','GET','ArrayFormat','csv','ContentType','text');
try
data = webread(second_url,options);
disp('CSV formatted data:');
data
catch
disp('No information found.');
end
end
Alternatively, you can use system to execute 'curl' from Matlab. You must make sure that 'curl' is available in the path.
prompt = 'Enter company of interest:';
val = input(prompt,'s');
command = strcat('curl',{' '},'http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=',val);
[~,cmdout] = system(command{1});
expression = '(http://).*(.csv)';
[~,matches] = regexp(cmdout,expression,'tokens','match');
url = matches{1};
command = strcat('curl',{' '}, url);
[~,cmdout] = system(command{1});
'cmdout' will contain the .csv response of the second GET request you are interested in. For multiple company names you would just have multiple curl requests. Hope this helps.
More Answers (0)
See Also
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!