Retrieving data from a web page

78 views (last 30 days)
Enrico Scupola
Enrico Scupola on 13 May 2018
Commented: Enrico Scupola on 20 May 2018
Hi all, I am having troubles in retrieving some data from this website http://ricercaweb.unibocconi.it/criospatstatdb/rep06a.php. I am not very experienced in programming and it is the first time I try getting data from the web this way. I read quite a few posts but I don't get what an API is and the issues related to the webpage formats.
To put it simply, I'd like to tell MATLAB to go on the stated webpage, to put the word 'CompanyName'(ex.Novartis) into the search bar, retrieve the resulting table and save it in a usable format (csv for example).
I run this code but it doesn't work.
url = 'http://ricercaweb.unibocconi.it/criospatstatdb/rep06a.php';
options = weboptions('Keyname','Novartis','Keyvalue','text','ContentType','auto');
data = webread(url,options);
Could anyone help me, please? Thanks.

Accepted Answer

Paolo
Paolo on 13 May 2018
Hi Enrico, try the following:
url = 'http://ricercaweb.unibocconi.it/criospatstatdb/csv/rep06a_';
prompt = 'Enter company of interest:';
val = input(prompt,'s');
url = strcat(url,val,'.csv');
options = weboptions('RequestMethod','get','ArrayFormat','csv','ContentType','text');
try
data = webread(url,options);
disp('CSV formatted data:');
data
catch
disp('No information found.');
end
If by 'inserting the name of the company in the search bar' you meant just changing the URL, then the code above should do the trick. Simply specify the term of interest and it will retrieve the data for you.
On the other hand, if you were asking for a code which fills in the HTML form with a POST request, and retrieves the resulting data afterwards with a GET request, some changes would need to be made. Let me know.
  3 Comments
Paolo
Paolo on 16 May 2018
Edited: Paolo on 16 May 2018
Hi Enrico,
I am not entirely sure whether the web page you are using supports POST request. You can achieve the exact same functionalities without it anyway. When you submit the form, a GET request for the term of interest (e.g. Novartis) is sent to a certain address. We can use this address for implementing what you require. You can see this address yourself if you inspect the form (right click, inspect), and then check the network components of the window (if you are using Chrome).
You can copy the GET request from here, with the easiest option being the 'curl' command. The curl command will look something like this:
curl "http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=novartis"
You can easily implement this in Matlab. There are two ways I can think of for doing it.
The first is as follows:
%Perform request at GET URL.
first_url = 'http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=';
prompt = 'Enter company of interest:';
val = input(prompt,'s');
first_url = strcat(first_url,val);
options = weboptions('RequestMethod','auto','ContentType','text');
%Read response.
try
data = webread(first_url,options);
catch
disp('No information found.');
end
%Search for http link. Perform second GET request at .csv address.
if ~isempty(data)
expression = '(http://).*(.csv)';
[~,matches] = regexp(data,expression,'tokens','match');
second_url = matches{1};
options = weboptions('RequestMethod','GET','ArrayFormat','csv','ContentType','text');
try
data = webread(second_url,options);
disp('CSV formatted data:');
data
catch
disp('No information found.');
end
end
Alternatively, you can use system to execute 'curl' from Matlab. You must make sure that 'curl' is available in the path.
prompt = 'Enter company of interest:';
val = input(prompt,'s');
command = strcat('curl',{' '},'http://ricercaweb.unibocconi.it/criospatstatdb/get_rep06a.php?q=',val);
[~,cmdout] = system(command{1});
expression = '(http://).*(.csv)';
[~,matches] = regexp(cmdout,expression,'tokens','match');
url = matches{1};
command = strcat('curl',{' '}, url);
[~,cmdout] = system(command{1});
'cmdout' will contain the .csv response of the second GET request you are interested in. For multiple company names you would just have multiple curl requests. Hope this helps.
Enrico Scupola
Enrico Scupola on 20 May 2018
Thanks a lot Paolo, that really helped me!

Sign in to comment.

More Answers (0)

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Products


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!