Skip to main content

All Questions

Tagged with
0 votes
1 answer
711 views

Extract year from timestamp in hive

I am writing the query to show the data entries for a specific year. Date is stored in dd/mm/yyyy hh:mm:ss.(Date TIMESTAMP - e.g. 12/2/2014 0:00:00). I am trying to display the two columns(name, ...
NoobCoder123's user avatar
0 votes
1 answer
419 views

How can I get the actual data size per row in Hive SQL?

It is possible to calculate what is the actual data size per row in Hive SQL? I have found this DBA question for MS SQL Server. I am not able to translate the accepted answer to Hive SQL. I'm ...
Ashkan's user avatar
  • 1,673
1 vote
2 answers
133 views

How can I store, retrieve (and perform munging)large csv files with python.?

I have a large csv file of size ~ 5-6GB (million of rows). So pandas cannot handle it (it gives memory error as my ram capacity is 2GB). I want to use Hadoop on it (i.e., store block of each file on ...
Vipul Singh's user avatar
0 votes
1 answer
373 views

How to perform Denormalization in Hbase?

We are trying to migrate our existing RDBMS(Sql Database) system to hadoop. We are planning to use hbase for the same. But we are not getting how to denormalize sql data to store it in hbase column ...
Bunny's user avatar
  • 439
4 votes
2 answers
17k views

Compare two tables in HIVE

I have 3 tables in hive: Control_table, with known data New_table, with data to check Result_table, table where records with different values in new_table then control_table are inserted to All ...
Jakub Zak's user avatar
  • 1,232
5 votes
1 answer
12k views

How to compare two tables and return rows with difference with HIVE

So lets say I have a table with about 180 columns and 100 records. This table is backed up into temporary table and original one is removed. After this migration (change) is run on a pipeline which ...
Jakub Zak's user avatar
  • 1,232
0 votes
1 answer
650 views

Getting probability density graph & k-means clustering with 300 million rows

The DBMS I use is MySQL(MariaDB). The table scheme is as below: CREATE TABLE MyTable ( ID INT PRIMARY KEY, TEXT VARCHAR(200), VALUE DECIMAL(15,2) ) The table has 300 million rows or more....
Keith Park's user avatar
0 votes
2 answers
297 views

Real time queries in MongoDB for different criteria and processing the result

New to Mongodb. Is Mongodb efficient for real time queries where the values for the criteria changes every time for my query. Also there will be some aggregation of the resultset before sending the ...
user203617's user avatar
2 votes
1 answer
220 views

Is there any abstraction layer to work with GFS or HDFS? [closed]

The SQL and NOSQL databases are used by facebook 1.Whether it uses GFS or HDFS or BOTH or some other? 2.What are the different Abstraction application layer available to work on HDFS AND GFS ?? 3....
JAVA Beginner's user avatar
8 votes
4 answers
7k views

Advanced queries in HBase

Given the following HBase schema scenario (from the official FAQ)... How would you design an Hbase table for many-to-many association between two entities, for example Student and Course? ...
Teflon Ted's user avatar
  • 8,806